Getting the url of the first sound file in an observation using @pisum's Jupyter Workbook

Hello @pisum, @thebeachcomber pointed me in your direction to solve a problem of getting iNat data in a format that I can use to prep the data for ingestion into the Queensland State-Wide biodiversity database, WildNet. I have been successful in modifying your Jupyter Workbook for a number of things but still struggling to figure out how to get the first url of the sound file. I have tried a number of different ways to parse the “sounds” key but just can’t figure it out. I would be very appreciative if you could point me in the right direction on the structure needed to pull the url out of the sounds key.

Once again, thanks so much for your great work on this workbook. It has already saved me loads of time! At some point I may ask for some help in getting the profile description “Tell us about yourself”, assuming that is possible for a list of observations.

as noted here: https://forum.inaturalist.org/t/select-observations-to-batch-download-from-list-of-observation-ids/61657/6

add a line to the parse_fields list, as appropriate:

  • for the url of the first sound file on an observation:
    {'label': 'sound_1_url', 'ref': 'sounds[0].file_url'},
  • for a comma-separated list of all the sound file urls for each observation:
    {'label': 'sound_urls', 'ref': 'sounds', 'function': 'filter_select', 'params': {'select_ref': 'file_url', 'separator': ', '}},
  • for a count of the number of sounds per observation:
    {'label': 'sounds_count', 'ref':'sounds', 'function': 'count'},
1 Like

You are a legend @pisum. I tried everything except file_url. I should have known better ;-)

Related question @pisum. I am looking to get the url for a user profile for the observation. I have tried {'label': 'user profile url', 'ref': 'user', 'function': 'combine', 'params': {'combine_refs': ['user.id'], 'template': 'https://www.inaturalist.org/users/{0}'}}, but no user id gets populated in the url. What am I missing? Thanks again!

try 'combine_refs': ['id'] instead of 'combine_refs': ['user.id']

Thanks again @pisum. My knowledge about using the api has increased dramatically through your assistance :-)

1 Like

you’re welcome.

i believe the base API request for what you’re trying to do would be a GET /v1/observations/identifiers. note that this will return only up to the top 500 identifiers. so no paging is necessary because there will be no results beyond the first page.

as far as i know, the “Tell us about yourself” information is available only through an undocumented and deprecated endpoint GET https://www.inaturalist.org/users/{id}.json as description (or by scraping the user page).

this second endpoint returns only one user per request. so the complicated part of getting this information is getting a unique list of user.id from the base results and then looping through them to make a secondary request per user.id, while maintaining a roughly 1 request / sec rate limit (iNat’s guidance).

this can be accomplished in many ways. if you use my workbook’s structure, you can look at the way the add_std_places function works in conjunction with the endpoint_get_places dictionary to serve as a model for doing the complicated part. it gets place information in chunks of 30, but you would want to get the additional user data in chunks of 1 basically.

if you use pyinaturalist instead, i believe you can use one of their paginators – the ID variant, i think – to do something similar. (i believe the difference between my workbook and pyinaturalist is that because mine uses async functions, it will delay the start of each request and potentially run multiple requests in parallel; whereas i believe pyinaturalist runs requests serially, waiting for one to complete before starting another.)

or you could write your own code to handle it all…

UPDATE: i noticed that the Android app was able to get user descriptions, and looking at how it does it, it hits the beta v2 API. so it’ll be better to get descriptions from v2 – either GET https://api.inaturalist.org/v2/users/{id}?fields=id,login,description or GET https://api.inaturalist.org/v2/users/{id}?fields=all – than from the deprecated API. even using the v2 endpoint, it seems like you’re still stuck doing one user per request.