Code to extract annotations from exported JSON

Megachile · May 16, 2022, 7:23pm

I’m working on a project to collect phenology data on galls, and in many cases it would be useful to associate annotations like life stage with that data. At the moment, it’s not possible to get this information directly in the csv downloader (which works for everything else I want to get) but I can get them with the Python API. Unfortunately, the result is a nasty JSON file rather than a simple csv I can manipulate. I’m planning to tackle this problem myself (with help) but I wondered if it was something others had already solved since it’s likely a common issue. Is there code out there already I could just copy to turn the JSON output into a table of annotations?

jwidness · May 16, 2022, 8:00pm

By “the Python API” do you mean the third-party package pyinaturalist?

Megachile · May 16, 2022, 8:01pm

Yes, exactly.

Megachile · May 16, 2022, 8:38pm

My impression is that the rinat package cannot extract annotations from the API for some reason, but that the pyinaturalist package can, so I’m using Python in RStudio, and successfully get the results I want. They’re just in a form I don’t know how to manipulate–JSON. I imagine that since many people have come up against this issue, someone has figured out how to go from that JSON to a simple dataframe or something of the annotations, so I was hoping to save myself some time and headache by using that code if someone has it and is willing to share it.

jwidness · May 16, 2022, 9:27pm

I haven’t seen a snippet for that particular need, but maybe @jcook has some suggestions?

pisum · May 17, 2022, 1:32am

are you looking for something that creates an observation table and also a separate annotations table? for example:

observation

id	obs date	sub date	observer	taxon	…
1	2022-05-01	2022-05-02	gall_lover	430050	…
2	2022-05-02	2022-05-02	jiro	430050	…
3	2022-05-03	2022-05-04	gall_gal	430050	…

annotation

id	obs id	term id	term value
1	2	1	2
2	2	9	10
3	3	1	7

… or do you just want to join and flatten the records (which could produce a little duplication in the observation records if one is tied to multiple annotation records)? for example:

results

id	obs date	sub date	observer	taxon	…	annotation id	term id	term value
1	2022-05-01	2022-05-02	gall_lover	430050	…	null	null	null
2	2022-05-02	2022-05-02	jiro	430050	…	1	1	2
2	2022-05-02	2022-05-02	jiro	430050	…	2	9	10
3	2022-05-03	2022-05-04	gall_gal	430050	…	3	1	7

… or are you wanting to make something more like a crosstab? for example:

observation

|id|obs date|sub date|observer|taxon|…|value for term_id=1|value for term_id=9|
|—|—|—|—|—|—|—|—|—|
|1|2022-05-01|2022-05-02|gall_lover|430050|…|null|null|
|2|2022-05-02|2022-05-02|jiro|430050|…|2|10|
|3|2022-05-03|2022-05-04|gall_gal|430050|…|1|null|

Megachile · May 17, 2022, 3:03am

Not 100% sure I grasp the distinction between the last two but something like that. What I want is basically this:

id	obs date	observer	taxon	…	life stage	evidence of presence
1	2022-05-01	gall_lover	430050	…	null	null
2	2022-05-02	jiro	430050	…	adult	gall; observation
3	2022-05-03	gall_gal	430050	…	larva	gall

I imagine the tricky part is that second row where one Annotation can take multiple values, and I’m agnostic on how to handle that; I could work with combining them or splitting into two columns (EOP: Gall y/n and EOP: Organism y/n) or anything like that. Life stage is mutually exclusive so that shouldn’t be too difficult.

jcook · May 17, 2022, 4:08pm

I’m willing to add some more features for annotations, but it may take me a couple weeks to get to. pisum might have ideas for a working solution in the mean time. There are a couple things I’ve been working on that could help with this, but I don’t think it will do exactly what you want yet.

First, there’s a higher-level interface in pyinaturalist that returns typed model objects instead of JSON. It’s a work in progress, which is why it isn’t fully documented yet, but the main observation and taxon searches are mostly complete. For example, in the latest version (0.17), this gives you Observation objects:

from pyinaturalist import iNatClient

client = iNatClient()
observations = client.observations.search(taxon_id=55594).limit(200)

All the nested data structures (annotations, taxon, etc.) are also objects. I haven’t tested that in RStudio, but that should give you tab completion, type hints, etc., making it easier to work with than JSON.

jcook · May 17, 2022, 4:08pm

I’m also working on some tools in pyinaturalist-convert for converting between various data formats, including tabular formats like CSV and dataframes. This is also a work in progress, though, and there’s more work to be done in flattening out some of the nested data structures (like annotations) in a way that’s actually useful.

Annotations in particular are a little tricky because the /observations endpoint returns them as IDs, not names:

{
  "controlled_attribute_id": 22,
  "controlled_value_id": 29,
}

And then you need to call the /controlled_terms endpoint to look up the labels for those IDs, which in this case translates to "Evidence of Presence": "Gall".

That’s definitely doable, though, and would be useful for some other data formats like Darwin Core (which has, for example, a lifeStage field). I just added an issue for that here.

Megachile · May 17, 2022, 4:21pm

I’m not in any particular hurry on this–I have enough other problems I can work on in the meantime that it’s not like I’d be able to complete the project with this piece anyway. I’ll play around with the other commands you mentioned and see what I can do, thanks.

jcook · May 17, 2022, 5:30pm

Sounds good. Just curious, are there any observation fields or tags you commonly use with galls? Or just annotations?

Megachile · May 17, 2022, 6:37pm

Yes, we use the Gall phenophase and Gall generation fields as well as Host Plant ID, and I’m planning to create another field for collection viability. Those I’ve been able to extract very easily with the csv downloader on the site (haven’t transitioned to coding it as an API call in R or Python yet but planning to, presumably Python so I can get the annotations too.). The main thing I need from the annotations is life stage.

jon_sullivan · May 18, 2022, 2:19am

Note that you can get to annotations and observation fields in R, but it does require some wrangling of the API’s “nasty JSON”.

I use the jsonlite package in R to get to the iNat API. It gives you access to everything in the API, unlike the simplified old rinat package.

Here’s a quick example that gets the plant observations from my garden:

#install.packages("jsonlite") # uncomment if you need to install jsonlite on your computer
library(jsonlite)

# coordinates for a square around my house
lat_max <- -43.579337
lat_min <- -43.580293
lon_max <- 172.633269
lon_min <- 172.632140

 # the iNat taxon ID for plants:
my_taxon_id <- 47126

# construct the url for the iNat API
iNaturl_obs <- paste0("http://api.inaturalist.org/v1/observations?nelat=",lat_max,"&nelng=",lon_max,"&place_id=any&swlat=",lat_min,"&swlng=",lon_min,"&taxon_id=", my_taxon_id,"&verifiable=any")

# get the JSON at that url
iNat_in_bounds_obs <- fromJSON(iNaturl_obs)

# show annotations
iNat_in_bounds_obs[[4]]$annotations

# show observation fields and values
iNat_in_bounds_obs[[4]]$ofvs

Getting what you want out into a simple CSV takes a little more wrangling in R, but it’s doable. (I’ve got code that does it somewhere but not at my fingertips.)

Megachile · May 18, 2022, 5:15pm

That is good to know–I would ideally prefer to keep the entire code in one language if possible. But yes, it’s unnesting the annotations out of the JSON that is giving me trouble.

pisum · May 19, 2022, 7:01pm

not sure if your language was Py or R, but if it’s Python, i think the basics of what you’re looking for can be found in this thing by @sbushes: https://forum.inaturalist.org/t/tool-for-exporting-inaturalist-data-to-irecord-or-elsewhere/19160.

in R, @hanly wrote a beta package to get observations, etc. from the v1 API: https://forum.inaturalist.org/t/using-r-to-extract-observations-of-a-specific-phenological-state/7007/6. i haven’t used it myself. so i’m not sure how it represents annotations, if it does at all…

i started down the path of creating an export tool in Power Automate just for my own use, but that platform has some issue handling null values in some cases. so then I was going to write something using Javascript (in Observable so that others can fork / adapt relatively easily), but i haven’t done it yet.

Megachile · May 19, 2022, 7:05pm

R is the only language I’ve worked much with and would probably stay on that. But open to switching if things would be much easier in Python.

pisum · May 19, 2022, 7:08pm

does hanly’s package help in your case?

Megachile · May 19, 2022, 7:09pm

I’ll give it a shot, thanks for the tip

Megachile · May 19, 2022, 7:52pm

Ok I got it up and running and was able to pull a bunch of data. It seems like it will get me the observations, but the annotations themselves are still in the resulting table as nested tables (I assume as JSON objects or however that works). So instead of having a Sex column showing that this observation was annotated “male”, it has a column for annotations that includes a 17-variable table that presumably contains that info somehow. It does let me keep everything in R but doesn’t solve the JSON issue yet.

Theoretically I could make separate API calls using code like this in the post you linked:

df ← iNat(taxon_id = 85332, quality_grade = “research”, term_id= 12 , term_value_id = 13)

such that every result for each query would have exactly that annotation applied, add a new column corresponding to the terms in the query, and then stitch them all together at the end. Seems cumbersome but at least something I feel confident I could figure out if it came to that.

Megachile · May 19, 2022, 8:09pm

This output places observation fields as a nested object as well, so even if I were to extract the observations by Annotation value in the first place, I would still need to flatten the JSON to get the observation fields.

Topic		Replies	Views
Downloading Annotations General question	16	4139	February 1, 2022
Using R to extract observations of a specific phenological state General	7	2481	December 5, 2019
Annotations dictionary General question	3	401	April 15, 2023
How do you download with Annotation: life stage? General question	2	826	December 21, 2021
Exporting comments? General question	3	465	May 24, 2022

Code to extract annotations from exported JSON

observation

annotation

results

observation

Related topics