Link between released dataset and inaturalist observations

For research purposes, I am currently working on crowdsourced data.
The iNaturalist dataset of 2017 (and other competitions in 2018/19/20 and 21) currently available released images, labels.

My question then is: is there any way from the released data to go back to the original observation? Like for example get the id of the observation or the uri? This way using the API we can get the full data information.

As of now, I think there might be a way to get this link using the image filename from the dataset (for example the first image for the 2021 release is train/02912_Animalia_Chordata_Actinopterygii_Siluriformes_Ictaluridae_Ameiurus_nebulosus/d615f184-8af4-4c60-b9f8-3081c1607644.jpg, and there might be a way to
get the original observation using the encoded string d615f184-8af4-4c60-b9f8-3081c1607644.

Thanks!

Can you post a link to the datasets that you are using? It would be helpful to see the examples to help answer the question.

Sure, here’s the link to the 2021 competition

Each image is defined as a json

image{
  "id" : int,
  "width" : int,
  "height" : int,
  "file_name" : str,
  "license" : int,
  "rights_holder" : str,
  "date": str,
  "latitude": float,
  "longitude": float,
  "location_uncertainty": int,
}

and the id is not the id in the iNaturalist export database, it is intrinsic to the competition.
For example the first image data is

{
    'id': 0,
    'width': 500,
    'height': 500,
    'file_name': 'train/02912_Animalia_Chordata_Actinopterygii_Siluriformes_Ictaluridae_Ameiurus_nebulosus/d615f184-8af4-4c60-b9f8-3081c1607644.jpg',
    'license': 0,
    'rights_holder': 'Ken-ichi Ueda',
    'date': '2010-07-14 20:19:00+00:00',
    'latitude': 43.83486,
    'longitude': -71.22231,
    'location_uncertainty': 77
}

However, using the export tool of iNaturalist with these metadata does not retrieve the dataset’s image

Hmm, @alex may be the best to ask a question like this as he works with iNat’s CV model. I don’t think Grant van Horn is on the forum, though he might be a good source as well.

as far as i can tell, this is not true.

that string appears to be some sort of UUID. iNat does store UUIDs for each observation and photo record (see https://github.com/inaturalist/inaturalist-open-data/tree/documentation/Metadata#photos), but the competition dataset UUID does not appear to match either of these UUIDs.

however, the competition dataset metadata does give you a rights holder, date, taxon (from the file name), and coordinates, which you should be able to use to track back to a unique observation in most cases.

1 Like

Indeed, it seems the UUIDs do not match, and sometimes even the usernames are not available. However, I got the export tool to work on some observations, for example (a random test):

{
    'id': 210,
    'width': 500,
    'height': 375,
    'file_name': 'train/06203_Plantae_Tracheophyta_Liliopsida_Liliales_Melanthiaceae_Trillium_ovatum/89838771-aaec-4b18-91b9-60d1c9aa1f10.jpg',
    'license': 1,
    'rights_holder': 'kestrel',
    'date': '2011-04-30 00:00:00+00:00',
    'latitude': 41.30684,
    'longitude': -124.01955,
    'location_uncertainty': None
}

can be linked using the username and the observation date to this observation, however this is not always possible… Thank you for the temporary workaround

1 Like

if the login is not provided, you still should be able to search for the login using v1/users/autocomplete or v1/search. but if the login or name has changed since the dataset was made, or if the user is no longer in the system, then you’re out of luck.

1 Like

You could use the API to attempt to resolve the rights_holder to a login, then use a mix of the login/lat/long/date fields to find a likely observation.

1 Like

Indeed, a mix of user_id (thanks to autocomplete), year, month and day gets most observations id.
However, some remain impossible to retrieve this way (maybe they were removed since?)
Like:

{
    'id': 0,
    'width': 500,
    'height': 500,
    'file_name': 'train/02912_Animalia_Chordata_Actinopterygii_Siluriformes_Ictaluridae_Ameiurus_nebulosus/d615f184-8af4-4c60-b9f8-3081c1607644.jpg',
    'license': 0,
    'rights_holder': 'Ken-ichi Ueda',
    'date': '2010-07-14 20:19:00+00:00',
    'latitude': 43.83486,
    'longitude': -71.22231,
    'location_uncertainty': 77
}

that does not match any retrieved observation…

Yes, this is an occasional issue with other iNat sourced data I have used - the observations may be subsequently deleted by the users. There’s not much to do in this case if you don’t already have the data. If the deletion is very recent, it’s possible that the current GBIF dataset might still have the observation data (but that’s a bit of a longshot, and likely not worth pursuing I would guess).

Not sure if it would fit your use case, but you might want to check out images from the iNat Open Data Set: https://www.inaturalist.org/blog/49564-inaturalist-licensed-observation-images-in-the-amazon-open-data-sponsorship-program

i’m able to retrieve this just fine:

Ken-ichi has multiple accounts. so maybe you’re trying to retrieve based on the wrong user_id?

even if you have multiple IDs, you can search by all 3 IDs together:

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.