For research purposes, I am currently working on crowdsourced data.
The iNaturalist dataset of 2017 (and other competitions in 2018/19/20 and 21) currently available released images, labels.
My question then is: is there any way from the released data to go back to the original observation? Like for example get the id of the observation or the uri? This way using the API we can get the full data information.
As of now, I think there might be a way to get this link using the image filename from the dataset (for example the first image for the 2021 release is train/02912_Animalia_Chordata_Actinopterygii_Siluriformes_Ictaluridae_Ameiurus_nebulosus/d615f184-8af4-4c60-b9f8-3081c1607644.jpg, and there might be a way to
get the original observation using the encoded string d615f184-8af4-4c60-b9f8-3081c1607644.
Hmm, @alex may be the best to ask a question like this as he works with iNat’s CV model. I don’t think Grant van Horn is on the forum, though he might be a good source as well.
however, the competition dataset metadata does give you a rights holder, date, taxon (from the file name), and coordinates, which you should be able to use to track back to a unique observation in most cases.
Indeed, it seems the UUIDs do not match, and sometimes even the usernames are not available. However, I got the export tool to work on some observations, for example (a random test):
can be linked using the username and the observation date to this observation, however this is not always possible… Thank you for the temporary workaround
if the login is not provided, you still should be able to search for the login using v1/users/autocomplete or v1/search. but if the login or name has changed since the dataset was made, or if the user is no longer in the system, then you’re out of luck.
You could use the API to attempt to resolve the rights_holder to a login, then use a mix of the login/lat/long/date fields to find a likely observation.
Indeed, a mix of user_id (thanks to autocomplete), year, month and day gets most observations id.
However, some remain impossible to retrieve this way (maybe they were removed since?)
Like:
Yes, this is an occasional issue with other iNat sourced data I have used - the observations may be subsequently deleted by the users. There’s not much to do in this case if you don’t already have the data. If the deletion is very recent, it’s possible that the current GBIF dataset might still have the observation data (but that’s a bit of a longshot, and likely not worth pursuing I would guess).