Data users— what are your use cases and requests for exporting data?

rachelshoop · June 20, 2019, 7:02pm

I wish that it was possible to export sex data for observations. I am trying to look for potential sex bias in encounters observers have with specific species, but this information is not included in the .csv.

cmcheatle · June 21, 2019, 12:08pm

Two challenges here.

There is no standard way to enter gender in an observation. It does exist as an annotation, but there are many other observation fields where data can be.

I’m not sure how you would separate any bias of what photo is selected for upload versus what gender is observed. Meaning a person saw both a male and a female while out, but chose to upload the picture of a female.

pisum · June 21, 2019, 12:54pm

i’m assuming that you’re trying to use annotations. i think i read somewhere that they purposely don’t include annotations in the CSV because of the way the data is stored. (it’s not just male or female. it’s also potentially someone agreeing or disagreeing with the original annotation.) that said, you could extract the data in a few ways, i think:

if you want to get the observation details, then even though the CSV won’t include annotation data, you can query by annotation fields. see https://forum.inaturalist.org/t/annotation-filter-is-missing/2677. so instead of pulling back, say, all fox squirrels with a field that says male or female, i think you could do this as 3 different queries – one for all fox squirrels, one for female fox squirrels, and one for male fox squirrels.
if you don’t have a lot of species to look at and don’t need super-precise figures, you could just look at the taxon page. there’s a graph shows you seasonality by default but can also give you seasonality by sex.
similar to #2, you could get the actual numbers that make up the graphs (and add additional filters) by using the API (https://api.inaturalist.org/v1/docs/#!/Observations/get_observations_histogram, https://api.inaturalist.org/v1/docs/#!/Observations/get_observations_popular_field_values).
similar to #3 + #1, if you just care about totals (not about individual observations), you could use the API to get, say, the total number of female fox squirrels. see https://api.inaturalist.org/v1/docs/#!/Observations/get_observations, and just pull back the first record (page = 1, per page = 1, only id = true) for a given taxon (+sex), and the result set will also include a total count (even though you’re just pulling back the first record in the set).

dkaposi · July 9, 2019, 1:25pm

I can see how this would be useful, the two projects that I manage have 120,000 observations so a better way to sift thought the data would be nice. Being able to search for challenging species and then reviewing the ID providers would be beneficial. I believe that you can can export a list of traditional project observations from the “Filter by Curator Identifications”, but that doesn’t list the specific curators either. That is useful as someone may want to maintain a ‘reviewed/vetted by’ list outside of iNat.

This would be a hugely helpful change, fields used in a project are far more relevant than random fields used by the individual performing the download. The current field options make sense if a user is downloading their own data, but it falls apart at a project scale.

My focus is on insects for downloading, so the presentation of annotations is simpler. We just need to be able to include annotations in downloads.

douglas-u-oliveira · July 15, 2019, 8:29pm

I’ve read that thread again and again in search for a way to export/import a csv file of all the identifications I have suggested for the community (not those “belonging” to myself). Maybe due to the technical complexity involved or my English language lack of knowledge, I just couldn’t find any straight answer to this. Can anybody help me?

cwbarrows · July 16, 2019, 7:10pm

I’m not sure if this is exactly what you want, but you can export a csv of all the observations that you have reviewed. Go to the “Identify” page on the top menu, click on the “Filters” menu, select the “Yes” option under “Reviewed”, and then click the download button in the corner.

douglas-u-oliveira · July 16, 2019, 8:11pm

Quiet close, @cwbarrows. Thank you. I was “messing” around the filters but didn’t understand the meaning of “reviewed” in the context. If I’m right now, it means “reviewed by me”. Looks like the system tried to export all the identifications I gave, including for my own observations, wich I would appreciate if excluded from the output. Anyway it didn’t work out in full due to the message below:

The expected result set for that query would be around 7,900 rows right now but, if I must specify a taxon, place or user id, I will have to know in advance any of those. What I am looking for is a way to revise systematically all the IDs I gave to community observations, wherever, whoever, whenever I did.

Anyway, I gave it a try limiting the observations for ‘Brazil’ and it worked out fine, bringing me exactly 6,958 observations of wich I filtered my user-id out and, oddly, resulted in 6,944 of reviewed observations (I have more then 5K observations of my own).

So, that’s great. I’ll make what I want to and, someday, I will try to revise the IDs I gave to the world, haha!

douglas-u-oliveira · July 16, 2019, 8:15pm

By the way, here is the URL I used with the 'Place" set up to ‘Brazil’:

https://www.inaturalist.org/observations/export?reviewed=true&quality_grade=needs_id%2Ccasual%2Cresearch

nathantaylor · August 13, 2019, 3:42pm

@carrieseltzer If it helps, the way that SEINet accomplishes this when backup files are downloaded is by creating a separate .csv document for the annotations. They also have separate .csvs for phenology, metadata, image information, a file for the herbarium information, and finally, the occurrence information itself. All are tied together by the occurrence ID (could be the observation ID here). This is all neatly zipped together in a file. Information in this format would be quite workable. The identifications work by having ones and zeros to indicate whether the IDs are active or not typically with only the latest one active in the case of specimens.

jan416 · August 14, 2019, 3:44pm

Content syndication; using images to support Integrated Pest Management approaches for farmers wordwide. I find the inaturalist database quite consistent and image quality is attractive.

I won’t bore you with the detailed shortcomings in the knowledge domain of agriculture. In short, there is too little easy and complete access to information on pest and disease management. Effort is to combine different datasets and allow farmers to search for a crop/pest/disease/weed in their local language and find related and practical content.

With over 17,000 species of plant pathogens and weeds in our database (and 200,000 plus synonyms in countless languages), I was looking for a way of automatically pulling in/showing images. Am trying the API approach thanks to a suggestion by user Pisum here.

krancmm · August 15, 2019, 6:35pm

I’m the mapping coordinator (meaningless title as it’s just me) for the North American Moth Photographers Group. Currently there are ~500,000 old records in the database that are used to generate the static maps for each moth species. In effect MPG is a consolidator that mines data from museums, private collections, journals, image-based citizen science databases and self-reported sight records.

The MPG site began before users expected rigorous data standards, and was then abandoned for several years. Much of the old mapping data is suspect. We are working on updating data and had hoped to use iNat observations.

Darwin Core, as used by GBIF and SCAN/Symbiota, doesn’t provide enough information. iNat Collection Project lacks the ability to filter by taxonomic rank, geoprivacy, etc., and one must still use the outdated Export Observations page. The observation API limits the number of calls to 500/x period of time and “throttles” if too many calls are made; it also includes every single field, wanted or not.

So far I can’t find a method to export the data MPG needs. Given the number of records, perhaps I need to work directly with iNat admin to obtain the desired results.

Export Observations as it’s now configured is simply not useful to MPG:

Create a Query “You can also cut and paste an observations URL from another part of the site. You must specify a taxon, place, user, project, or search query.”

No, one cannot, but I’d certainly like to . Using this search returns 553,000 (and counting) observations: https://www.inaturalist.org/observations?rank=species&geoprivacy=open&place_id=1,6712&quality_grade=research&taxon_id=47157&without_taxon_id=47224. Translated: All US and Canada research grade records at species rank with open geoprivacy, and excluding Butterflies from Lepidoptera. Put that query into the Create a Query gray-bordered search box and it returns 0 or random numbers up to every record on iNat.

*New-Updated_at : Required. After an initial gargantuan download, for future downloads I’d need the ability to select records where updated_at (not created_on) was greater than the last updated_at in the previous download. There doesn’t seem to be a reason why one would have to download and re-manipulate an entire data set every x period of time.

*New-Annotations : Required. For insects there isn’t a problem with multiple choices. A single field with Life Stage entry: None, Adult, Larva, Pupa would suffice. Sex annotation is simply None, Male, Female

*New-Identifiers : Required, with user_login. Most of the avid moth-ers in North America know the experts or experienced enthusiasts. Although this suggestion breaks database normalization rules, I’d like no more than 5 identifiers listed as separate fields: IDer1, IDer2, IDer3… Most have only 1 or 2, but some of the trickier species may have up to 5.

Observation Fields/Tags: At present only one’s own fields and tags are available for selection even when extracting obs from a different single user. I have no idea what additional fields/tags are being used by the other 25,000+ moth users, but if there are other fields/tags than my own, I’d like to have the opportunity to see what they contain.

*New-Download Selected Fields by User Specified Order. Not required but may be a huge time saver for Excel users. Now a download presents fields in the order that iNat lists them on the Export Observation. In Excel one can’t map the fields to existing spreadsheet columns. However, most DBMSs do allow mapping fields. Depends on what program is being used for the downloaded data.

A question; perhaps it should have been the first @carrieseltzer: is the Export Observations page actually in the pipeline for an update/overhaul, or is this simply information-gathering for an undetermined future that may or may not ever be implemented?

@dkaposi Anything to add, David?

Thanks,
Monica Krancevic (krancmm)

dkaposi · August 18, 2019, 4:25pm

Nothing new to add, but this is a nice summary of things that we would also find helpful. I don’t believe that the export functionality has changed much in the 4 years that I’ve been on iNat. Given the size of the site, and the addition of annotations, it would be great to adapt the export utility if there is a desire to see iNat-data used more widely.

David

efmer · September 22, 2019, 12:34pm

I can handle any format as long as it’s exported.
E.g. a field annotations with Plant Phenology:flowering

bouteloua · November 27, 2019, 4:26am

A post was split to a new topic: Species_guess in exports: what is it recording?

jwidness · December 3, 2019, 1:24pm

Yes, please

thecaiman1 · December 19, 2019, 3:20pm

I have created a project to encompass multiple properties (15) in a park system. I would like to be able to export all data for the project in a single export while also retaining the place information. I created and named all the places individually and then included them all in my project. However, if I export all project data there is no option to include a column in the .csv with my designated place names. None of the current “Geo” options will provide this information. Converting the latitude and longitude data into places would be difficult because the designated places have irregular shapes and are all very close together within a single city.

pisum · December 20, 2019, 4:18am

you should be able to tie coordinates to polygons using a GIS application like QGIS or ArcGIS. alternatively, you can use the iNat API to get encompassing place ids, and then you can match with a list of ids for your places. alternatively, you can do a csv export for each of your 15 places. add the place id column manually based on the place id associated with each export, and then merge them together.

dkaposi · January 18, 2020, 6:11pm

@carrieseltzer - are there any updates that you can provide on this thread? Most critically, I am interested in the ability to download annotations. Thanks
David

carrieseltzer · January 27, 2020, 5:41pm

Right now it’s possible to access annotation information via the API. I know it’s not the easiest solution, but the json data about each observation does include annotations.

For example, here’s a search that returns data on all of your monarch observations from Ontario with life stage annotations: https://api.inaturalist.org/v1/observations?place_id=6883&taxon_id=48662&user_id=dkaposi&term_id=1&order=desc&order_by=created_at

The easiest place to find the annotation terms is in this post How to use iNaturalist’s Search URLs - wiki:

How to use iNaturalist's Search URLs - wiki part 1 of 2

Search for Annotations

&term_id= - the annotation group

1 =Life Stage, 9 =Sex, 12 =Plant Phenology, 17 =Alive or Dead

&term_value_id= - the value within the group

Life Stage: 2 =Adult, 3 =Teneral, 4 =Pupa, 5 =Nymph, 6 =Larva, 7 =Egg, 8 =Juvenile, 16 =Subimago

Sex: 10 =Female, 11 =Male

Plant Phenology: 13 =Flowering, 14 =Fruiting, 15 =Flower Budding

Alive or Dead: 18 =Alive, 19 =Dead, 20 =Cannot Be Determined

Both the group parameter and value parameter should be included in the URL. And term_value_id should be able to accept a comma-separated list of more than one value.

Here are all verifiable Lepidoptera observations with a Life Stage of Larva: https://www.inaturalist.org/observations?place_id=any&taxon_id=47157&term_id=1&term_value_id=6

And here are all verifiable Lepidoptera observations with a LIfe Stage of Larva or Adult: https://www.inaturalist.org/observations?page=2&place_id=any&taxon_id=47157&term_id=1&term_value_id=2,6

To exclude observations with particular annotations, use the following similar to the above:

&without_term_id= the annotation group

&without_term_value_id= the value within the group

krancmm · January 27, 2020, 10:47pm

From the API: “Please note that we throttle API usage to a max of 100 requests per minute, though we ask that you try to keep it to 60 requests per minute or lower, and to keep under 10,000 requests per day. If we notice usage that has serious impact on our performance we may institute blocks without notification.”

@dkaposi David and I have the same problem, but using his as an example:

His traditional project, Moths of Ontario, https://www.inaturalist.org/projects/moths-of-ontario, now has 154,199 observations that are annotated. If 200 is the max per page on the API GET obs endpoint, that would be 770 calls over 15 days.

I have an even larger number of obs to deal with if I’m to use any of iNat for the North American Moth Photographers Group mapping as I delineated on August 15, 2019 in this thread.

Perhaps iNat could make exceptions for “special use” cases to allow more extensive use of the API, or, ideally, fast track an update to the old Export Observations.

Topic		Replies	Views
Downloading a CSV of all observations of a species with python General programming	22	3560	May 28, 2022
Select observations to batch-download from list of observation IDs General question	19	358	April 21, 2025
Tool for exporting iNaturalist data to iRecord (or elsewhere) Tutorials	38	5913	April 28, 2022
What would you like to learn about getting data from the system? General	33	3360	April 6, 2021
Export a random selection of the observations of a project Feature Requests web	5	121	February 4, 2026

Data users— what are your use cases and requests for exporting data?

Related topics