How to bulk/batch add ITS sequence data to an observation field

I made this colab a while back and thought maybe it would be useful to share.

It takes a csv of observation ID numbers and ITS sequences and adds the sequences to the observations in the observation field DNA Barcode ITS.

4 Likes

Very useful. I tried using to add my barcodes but it timed out after a few. Perhaps the City Nature Challenge is causing a bottlneck.

timeout Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
425 # Otherwise it looks like a bug in the code.
→ 426 six.raise_from(e, None)
427 except (SocketTimeout, BaseSSLError, SocketError) as e:

27 frames
timeout: The read operation timed out

During handling of the above exception, another exception occurred:

ReadTimeoutError Traceback (most recent call last)
ReadTimeoutError: HTTPSConnectionPool(host=‘api.inaturalist.org’, port=443): Read timed out. (read timeout=10)

During handling of the above exception, another exception occurred:

ReadTimeout Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
527 raise SSLError(e, request=request)
528 elif isinstance(e, ReadTimeoutError):
→ 529 raise ReadTimeout(e, request=request)
530 else:
531 raise

ReadTimeout: HTTPSConnectionPool(host=‘api.inaturalist.org’, port=443): Read timed out. (read timeout=10)

How many is “a few”? And the first ones definitely worked? To be totally honest, I haven’t tested it in at least 6 months, so it’s possible there’s a problem, but it’s also true CNC isn’t the best time to be fiddling with things.

It did 78 of 1,888

Ok, I tried adding a longer timeout, but I haven’t tested it at all. I can look at it again later, or you can go ahead and see if it works now. :crossed_fingers:

Thanks.
I’ll give it a shot when the CNC has died down a bit.

1 Like

I confirmed that it does still work with a single test observation. Hopefully the extended timeout will make larger batches go smoothly.

For those wondering what an ITS sequence is, here is an informal description from Wikipedia:

Internal transcribed spacer (ITS) is the spacer DNA situated between the small-subunit ribosomal RNA (rRNA) and large-subunit rRNA genes in the chromosome or the corresponding transcribed region in the polycistronic rRNA precursor transcript.