What's the best way to share Python code nowadays?

I’ve been experimenting the last couple of days with a little Python just to see what is possible. I think my end goal is to make some scripts that can interact with the iNat API and other data sources, and then publish / share these in a way that other folks can run them easily without having to do much setup on their end (ideally just running from a web browser, since every computer and mobile device has one already nowadays).

As best as I could tell, Jupyter notebooks (.ipynb files) are the best format to share Python code, and it looks like setting up a JupyterLite instance in GitHub Pages is an easy and fast (and free) way of then making those notebooks available to the masses. So I set up a test GitHub repo, and made a sample Jupyter notebook available: https://jumear.github.io/stirpy/lab/index.html?path=iNat_APIv1_get_observations.ipynb.

One issue I’m noticing about this setup is that if the end user updates the code in the notebook, their own version of the notebook is saved, and then if I push out a new version of notebook to my repo, the end user won’t see the updated version unless they clear their browser cache, browse privately, or manually delete / rename their version of the notebook. (I don’t think it would be obvious that folks should delete a notebook to get the most recent version.)

So then I’m wondering if anyone has come across any better ways of sharing Python code or have any ideas to improve my existing setup? I think I want to steer clear of Colab because I don’t want to require folks to have a Google account to run code, and I think I want to steer clear of Binder, too, since I get the sense that it’s on its way out due to lack of funding (to be eventually replaced with JupyterLite maybe).

6 Likes

I know you said you want to steer clear of Colab, but I’ll just give my +1 for it as I’ve found it to be the most convenient way to get Python up and running quickly online/very easy for beginners.

1 Like

is it easier for beginners to use than my example JupyterLite deployment? (going to the notebook link above, i can run my sample code even on my several-year-old budget phone from its built-in browser.)

to my eyes, Colab has a slightly more polished interface, but i don’t see much of a difference functionality-wise for running simple code. i suppose Colab has integration with Drive, which might make it more straightforward to save documents in the cloud (compared to just using browser storage for JupyterLite). what are some of the other specific advantages offered by Colab?

I am not super familiar with JupyterLyte specifically (though I did click on the link and poke around a bit), so I speak from limited experience. Most users in my Python classes (I was a student, not the instructor), ended up using Colab over Jupyter notebooks implementations pretty organically. We were given the choice to use either. It just seemed to run with fewer issues and be easier for sharing/working on group projects. The revision history was very useful for group work, and I couldn’t find that immediately in the JupyterLyte implementation.

Colab is integrated with Google Drive somewhat (though I think this could be improved), but it did help to be able to share datafiles to access.

I don’t know how pushing out new versions of scripts would work with Colab or if it would be better - my guess is that as this is a core function of Github, their functions might be better suited in that regards.

1 Like

I am familiar with Python, but not so familiar with Jupyter notebooks and version control. However, I found the script in the sample Jupyter notebook easy to edit and execute.

To get a sense of the experience of using this interface, I adapted the following line to substitute my own iNat username for the user_id that was provided here:

req_params_string = 'verifiable=true&spam=false&user_id=pisum'

Then I used the button pictured here to execute the script:

restart kernel run all cells

After downloading the output, I restored the user_id back to the original, in an effort to be a good iNat citizen. Since I had not been asked to authenticate prior to my editing the code, I thought that perhaps if I did not restore the code, the next user to work with it would encounter the change I had made, which would unlikely be of interest to them. I wanted to leave only footprints, or preferably even less. :wink:

Was restoring the code to its original state unnecessarily cautious? After one anonymous user has worked with the code, what does the next user see?

1 Like

that’s probably fair. developing code on JupyterLite does occasionally seem a bit buggier than i would like (although i don’t think most people would notice bugginess just making minor edits and running code), and it really isn’t made for sharing edits other than the fact that you can save off a .ipynb file. so if you’re working on a group project, i can see how working in Colab might be easier.

i guess the “sharing” workflow that i’m envisioning for my purposes is more of a central publisher model though (as opposed to many people working on the same instance of code). so i’d be the only one who can publish code to the masses, although folks could submit code to be published through Github. so i’m not sure Colab would improve that kind of workflow.

this is the confusing / nonintuitive part of JupyterLite that i was getting at in the third paragraph of my original post in this thread. what actually happens if you make a change and save it – and auto-saving is turned on by default (although i’m trying to figure out a way to turn it off by default) – is that you effectively fork the original notebook (and save it in your browser storage), but there’s not any indicator that i can see that shows that the forking occurred.

so any changes you make stay with your browser, and if you want to get back to the original notebook, you can rename or delete your notebook (or clear your browser cache, browse in private (or using a different browser account), or use a different browser / machine that’s not tied to your other browser).

and any other user who goes to view the notebook from the link in the original thread won’t ever see your changes (unless, say, your partner views the notebook from your same machine and uses the same browser and same browser account). this actually is good from a privacy perspective, since everything is happening locally on your client (maybe unless your browser then loads stuff to the cloud to allow for syncing across machines on a given browser account).

2 Likes

Is the entire notebook with all the included Python code saved locally in a file used by the browser, or is it actually something like a cookie or url that is saved locally that enables access to a version of the notebook that is stored remotely on the server, or is it some combination of both?

I would be happy to try out what you have established, especially if others are also interested. Then we can discuss our experiences and opinions on this forum so that it can be refined and optimized for general benefit. Should you create a wiki post that serves as a brief guide to how to use the notebook that can be updated by any of us who are interested?

1 Like

it’s generally stored locally via the browser’s IndexedDB implementation, although JupyterLite can fall back to other types of storage, depending on the user’s browser setup.

that’s a good idea, but i haven’t entirely settled on the best path for publishing the code yet. once that’s decided and implemented, i’ll work on how to document stuff.

1 Like

do Python folks do much with asynchronous programming (ex. async / await)? it seems to be a fact of life over in the Javascript world, but i don’t think i’ve ever seen anyone do that in the Python code examples i’ve seen in the forum here, although the concept seems to have been discussed briefly at one point.

i ask because JupyterLite, using the Pyodide kernel, currently seems to handle network requests a little differently than a typical Python setup. because code on Pyodide is intended to be run from the browser, and the browser is sandboxed from your machine (and the host server typically won’t handle your requests either), those requests will typically need to go through a Javascript backend.

the consequence of that is that instead of using the requests and urllib modules that it seems like most Python folks use to make requests, Pyodide has its own set of modules for making requests. several of these run asynchronously, and one of those is the one i’ve chosen to use in my example. do you think Python folks would find it hard to figure out what’s going on with the async stuff if they’ve never seen it before?

there currently is a shim available for requests and urllib that involves a couple of lines of code, and it looks like Pyodide has incorporated those into its core a couple of weeks ago (so folks wouldn’t have to shim themselves), although JuypterLite hasn’t yet put that into their version of Pyodide. i’m thinking soon, requests and urllib should be more or less available (maybe except for some of the more advanced features) in JupyterLite though, without user shimming. (right now pyinaturalist relies on requests, too. so i’m, hoping that it’ll work once the JupyterLite Pyodide kernel gets the updates.)

but even then, i like the idea of making requests asynchronously, and so i was going to do a lot of the requests in my own code that way. but if it’s going to be too out of left field for most Python folks, i might stick with plain ol’ requests (once it’s available)…

While it is not as central to Python as it is to JavaScript, asynchronous programming has an important presence in Python libraries, especially ones used to manage or interact with web sites and databases.

See official documentation at asyncio — Asynchronous I/O.

I would anticipate that Python folks here such as myself would be eager to learn to use this material, especially if we can have discussions here that clarify how to use it. These discussions might be enhanced if they were to be given an obvious home on this forum - an identifiable place to look for them.

EDITED below:

The Nature Talk category with its question tag, where this discussion currently resides seems fine for introducing the type of material you have presented. Perhaps though, over the longer term, we need some categorization or tagging that is more descriptive of such material so users can find it more easily; it could specify something like “using data” or “working with data”. Discussions specifically about the iNaturalist API itself could also reside there along with ones about using Python with iNaturalist data, with or without interaction with the API.

1 Like

You should be able to add/create tags when you make a post - feel free to make a Python one or whatever you think would be useful. If the topics are about coding in relation to iNat, it’s fine for them to be in General I think.

1 Like

Thanks for bringing that up. I’ll go with whatever precedent @pisum decides to establish regarding tagging this type of material.

i’m still mulling options, but since no one has has suggested anything other than Colab, i am leaning towards deploying this stuff via JupyterLite. i think i want to give it a few weeks to see if they deploy the new Pyodide core with the requests/urllib shims built in so that i can see if that helps make things more like folks are used to.

so for now, i think “What’s the best way to share Python code nowadays?” is still the question i’m hoping to get insights from others on right now.

once that’s answered to my satisfaction, i will make a new thread where we can discuss these scripts, etc. if you have questions or suggestions in the meantime, though, i think it would be fine to have small tangents in this thread, or feel free to message me directly.

by the way, i made some updates to jumear/stirpy. there’s a new readme notebook which has some of the stuff we’ve talked about here, and i also added more code to the first notebook.

2 Likes

One advantage might be that Colab is preloaded with a number of machine-learning and data-science related modules. I have generally had very good experiences with Google Colab, and it would be my first choice, if I wanted to share Python code through Jupyter notebooks.

That said, I really don’t think Jupyter notebooks are a good way to share code at all. They’re good for experimentation, and fleshing out ideas. Maybe they’re good for preparing presentations, as well. But, they’re hard to build on. If someone else builds on the code in your notebook, it’s awkward and laborious for them to take advantage of bug fixes or extensions you or others might make to that notebook.

if you’ve developed some useful python code for working with the iNat API, I would put the effort into packaging it as a proper Python module, that other people could use by loading it with pip or conda into their own projects.

2 Likes

i don’t think this is really where i want to go as my first priority. if i happen to make something really interesting that i think would be useful as a package / module, i wouldn’t be opposed to publishing this way, but i think going down that path would be more of a happy side journey than a main goal.

i’m thinking notebooks are better for my vision because i’m more interested in sharing “how to” and “look what you can do” kinds of things that can be adapted easily by as many people as possible. so i think it’s helpful to have all the code there together along with all the supporting narrative in one thing (the notebook), even if it leads to a lack of modularity.

in the Javascript world, Observable has an interesting solution where their notebooks can be published in a central repository so that others can find and access them easily. you can fork any other notebook that you can access, but you can also import specific parts of other notebooks for use in your notebook (and changes to those other notebooks will also be imported into your notebook when you run the code). in other words, Observable notebooks can effectively function as both regular notebooks and packages.

unfortunately, i don’t think that sort of thing exists in the Python world.

pre-loaded is convenient, i suppose, but are any of these things that i couldn’t just install and load into any other Python setup?

i think it’s nice that it is theoretically easy to scale up your processing power in Colab or SageMaker or Azure ML and maybe get access to pre-trained models or stuff like that, but i don’t think the kinds of stuff that I want to share would even involve machine learning or need massive processing power. but maybe i’m just being short-sighted?

can you clarify on what makes the experience very good? is it a well-designed user interface? stability? workflow? other things?

if i step back a bit, maybe it’s not super important whether i share stuff via JupyterLite, Binder, Colab, or something else. as long as the notebooks are in GitHub, any of these platforms can open them and run the code (perhaps with only minor edits, if any).

for example, i can create a link to open my notebook in Colab: https://colab.research.google.com/github/jumear/stirpy/blob/main/content/iNat_APIv1_get_observations.ipynb, and i can create a link to open a random notebook that’s not mine but is published in GitHub in my JupyterLite deployment: https://jumear.github.io/stirpy/lab?fromURL=https://raw.githubusercontent.com/jupyter/try-jupyter/main/content/notebooks/Intro.ipynb.

so maybe it’s good to have a JupyterLite deployment to give folks a way to run code in the web even if they don’t want to go through Colab, but maybe the way to think about it is that as long is the code is published in GitHub, you can run it using whatever Python distribution you want to use…

Hello @pisum
I think the best way to create and share notebooks online nowadays is basthon.

There’s the link to your notebook on basthon: https://notebook.basthon.fr/?from=https://raw.githubusercontent.com/jumear/stirpy/main/content/iNatAPIv1_get_observations.ipynb

As you can see, you can share a notebook stored elsewhere on the web.

It is a client-side app: people can use your notebook, but if they modify it, their own version won’t delete yours. As Basthon is developed by a French teacher, the documentation is in French, but I think the developer speaks English, or you can ask me if you need help.

when i run my notebook, it generates a CSV file. but where do i go to view and download that CSV file in basthon?

it looks like basthon is just a reimplementation of JupyterLite. since JupyterLite handles requests differently that most other implementations of Python, i wanted to see how baston handles the problem, but i can’t figure out how to install a package when running this notebook: https://notebook.basthon.fr/?from=https://raw.githubusercontent.com/jumear/stirpy/main/content/pyiNat_get_observations.ipynb.

I don’t know how to download it, but it exists somewhere. You can print it:

datatocsv(obs,'observations.csv')
f = open("observations.csv","r")
for line in f:
    print(line)

or

datatocsv(obs,'observations.csv')
f = open("observations.csv","r")
f.read()

You can import a .py and install it as a module, and upload other files (texts such as .csv, images…) with the 2nd button from the left, the one that shows “Open a notebook etc.”, just below “Fichier”.

hmmm… that seems less user-friendly than what can be done in JupyterLite (or Colab).