Data users— what are your use cases and requests for exporting data?

It would be awesome to integrate with Daymet or with NLCD to get the land cover of the polygon/place for the observations being downloaded, or the weather occurring on the date of observation

1 Like

A post was split to a new topic: How to download taxa

I’ve been thinking on this issue of how to more easily capture usernames of identifiers in a concise and useful way in an export. Currently we have fields for num_identification_agreements and num_identification_disagreements. Without getting too deep in right now, I don’t think this is currently calculating as you might expect, but what if we fix the logic and essentially replaced these counts with lists of usernames?

New fields proposal:
Obs_taxon_exact_match
Shows usernames associated with identifications exactly matching the observation taxon (i.e. the taxon_id exported in the csv and displayed at the top of the observation detail page).
Obs_taxon_mismatch
Shows usernames associated with identifications that do not exactly match the observation taxon. This includes identifications of the observation taxon ancestors as well as maverick IDs.

Would this mostly satisfy your desires to search for observations identified by specific users, @nathantaylor @cthawley @jdmore ?

(Editing to add clarification that the “observation taxon” is the taxon currently associated with the observation, which may be ahead of the community identification (CID). We recently updated the images and text on this topic in Getting Started.)

2 Likes

@carrieseltzer This would definitely be an improvement, as it would allow us the ability to find data that is labeled incorrectly and manually change it, but we still would have to manually change the data by going back and forth from the datasheet to the iNaturalist observation itself. This isn’t a big deal for small datasets, but for large datasets, I can see this being a challenge. I’d have to run some tests with the new system to see just how much of a time sink this is, but it would be an improvement.

If it is too difficult to retrieve the ID associated with the obs_taxon_mismatch data, can an option be put in place to add a category for the ID of a particular user? Perhaps adding the option to add project curator IDs as a work around if necessary? Either would fix essentially all the problems I’ve run into, especially if in combination with what you’re proposing.

I don’t think it will alleviate all the future problems, as the IDs associated with the obs_taxon_mismatch data with the ID of the obs_taxon_exact_match data is really the only way to have a complete dataset to make determinations on what the name should actually be without constantly returning to the iNat website or downloading the data piecemeal (i.e., download taxa individually by including expert identifications of that particular taxon).

@carrieseltzer Thanks for getting back to us!
I think that this would be useful in some cases, though the potential discrepancy between Community ID and observation taxon worries me. I actually didn’t realize that the ID currently downloaded was the observation taxon and not the community ID (oops! I need to read more closely) which would be more valuable for some of my work. (Presumably I would get the Community ID if I download from GBIF?)

I think that the most useful solution for me would be that proposed by @nathantaylor to have an optional field where one could input a User ID and have a column in the database return the taxon ID that that User gave to each observation (with NAs if they didn’t ID it).

Another nice feature of this might be that, if accessible from the API, and in combination with the data from the proposed fields for Obs_taxon_exact_match and Obs_taxon_mismatch, one might be able to use the API to query the taxon of each user’s identification.

Thanks for looking into this!

1 Like

See also this relevant feature request: https://forum.inaturalist.org/t/search-by-observation-id-or-community-id/3620

1 Like

i’ve been experimenting with mapping/GIS stuff a bit (https://forum.inaturalist.org/t/which-gis-tools-do-you-use-and-why/3519/9), and i think i’m learning that if data could be exported to a (Geo?)JSON format, that might actually be nicer than a CSV in many cases. i’m thinking in particular that that would be a better structure for capturing things that could have M:1 relationships (like many identifications for a given observation). it looks like most GIS applications will load these without issue, and it might be an easier starting point for creating interactive maps from scratch than a CSV.

It would be great if we could export all the observations in a given area, such as the map search which I screenshotted. This would be super useful for mapping out biodiversity in a given area!

This is already possible. In your image above, you just need to click “Redo search in map”, then “Filters”, then “Download”

3 Likes

Hi,

I am developing a desktop database system written in Libre Office Basic aiming to distribute it freely among some public schools here in my town. The objective is to attrack kids to science through the use of iNat as a plataform that will hold up their observations within projects wich will be imported back into that database. It is already functional and I’m just making some final testing and debugging. I’ve read a lot in this thread and searched through the API carefully for a parameter or function that would let me import, through the iNat CSV export tool, only new observations or older ones that had any change (Ids, comments, faves, etc) after a certain date. I’ve found the ‘created_after’ (don’t know where) and ‘observation_created_d1’ (in the API). But couldn’t find an ‘updated_after’ or something alike. That could be very helpful for heavy users, even not being exactly my own case (almost 5K observations among projects and personal). For me, instead, it would reduce processing time to discard the unaltered records. Any help?

1 Like

I wish that it was possible to export sex data for observations. I am trying to look for potential sex bias in encounters observers have with specific species, but this information is not included in the .csv.

Two challenges here.

There is no standard way to enter gender in an observation. It does exist as an annotation, but there are many other observation fields where data can be.

I’m not sure how you would separate any bias of what photo is selected for upload versus what gender is observed. Meaning a person saw both a male and a female while out, but chose to upload the picture of a female.

i’m assuming that you’re trying to use annotations. i think i read somewhere that they purposely don’t include annotations in the CSV because of the way the data is stored. (it’s not just male or female. it’s also potentially someone agreeing or disagreeing with the original annotation.) that said, you could extract the data in a few ways, i think:

  1. if you want to get the observation details, then even though the CSV won’t include annotation data, you can query by annotation fields. see https://forum.inaturalist.org/t/annotation-filter-is-missing/2677. so instead of pulling back, say, all fox squirrels with a field that says male or female, i think you could do this as 3 different queries – one for all fox squirrels, one for female fox squirrels, and one for male fox squirrels.
  2. if you don’t have a lot of species to look at and don’t need super-precise figures, you could just look at the taxon page. there’s a graph shows you seasonality by default but can also give you seasonality by sex.
  3. similar to #2, you could get the actual numbers that make up the graphs (and add additional filters) by using the API (https://api.inaturalist.org/v1/docs/#!/Observations/get_observations_histogram, https://api.inaturalist.org/v1/docs/#!/Observations/get_observations_popular_field_values).
  4. similar to #3 + #1, if you just care about totals (not about individual observations), you could use the API to get, say, the total number of female fox squirrels. see https://api.inaturalist.org/v1/docs/#!/Observations/get_observations, and just pull back the first record (page = 1, per page = 1, only id = true) for a given taxon (+sex), and the result set will also include a total count (even though you’re just pulling back the first record in the set).
1 Like

I can see how this would be useful, the two projects that I manage have 120,000 observations so a better way to sift thought the data would be nice. Being able to search for challenging species and then reviewing the ID providers would be beneficial. I believe that you can can export a list of traditional project observations from the “Filter by Curator Identifications”, but that doesn’t list the specific curators either. That is useful as someone may want to maintain a ‘reviewed/vetted by’ list outside of iNat.

This would be a hugely helpful change, fields used in a project are far more relevant than random fields used by the individual performing the download. The current field options make sense if a user is downloading their own data, but it falls apart at a project scale.

My focus is on insects for downloading, so the presentation of annotations is simpler. We just need to be able to include annotations in downloads.

1 Like

I’ve read that thread again and again in search for a way to export/import a csv file of all the identifications I have suggested for the community (not those “belonging” to myself). Maybe due to the technical complexity involved or my English language lack of knowledge, I just couldn’t find any straight answer to this. Can anybody help me?

I’m not sure if this is exactly what you want, but you can export a csv of all the observations that you have reviewed. Go to the “Identify” page on the top menu, click on the “Filters” menu, select the “Yes” option under “Reviewed”, and then click the download button in the corner.

Quiet close, @cwbarrows. Thank you. I was “messing” around the filters but didn’t understand the meaning of “reviewed” in the context. If I’m right now, it means “reviewed by me”. Looks like the system tried to export all the identifications I gave, including for my own observations, wich I would appreciate if excluded from the output. Anyway it didn’t work out in full due to the message below:

The expected result set for that query would be around 7,900 rows right now but, if I must specify a taxon, place or user id, I will have to know in advance any of those. What I am looking for is a way to revise systematically all the IDs I gave to community observations, wherever, whoever, whenever I did.

Anyway, I gave it a try limiting the observations for ‘Brazil’ and it worked out fine, bringing me exactly 6,958 observations of wich I filtered my user-id out and, oddly, resulted in 6,944 of reviewed observations (I have more then 5K observations of my own).

So, that’s great. I’ll make what I want to and, someday, I will try to revise the IDs I gave to the world, haha!

2 Likes

By the way, here is the URL I used with the 'Place" set up to ‘Brazil’:

https://www.inaturalist.org/observations/export?reviewed=true&quality_grade=needs_id%2Ccasual%2Cresearch

2 Likes

@carrieseltzer If it helps, the way that SEINet accomplishes this when backup files are downloaded is by creating a separate .csv document for the annotations. They also have separate .csvs for phenology, metadata, image information, a file for the herbarium information, and finally, the occurrence information itself. All are tied together by the occurrence ID (could be the observation ID here). This is all neatly zipped together in a file. Information in this format would be quite workable. The identifications work by having ones and zeros to indicate whether the IDs are active or not typically with only the latest one active in the case of specimens.

1 Like

Content syndication; using images to support Integrated Pest Management approaches for farmers wordwide. I find the inaturalist database quite consistent and image quality is attractive.

I won’t bore you with the detailed shortcomings in the knowledge domain of agriculture. In short, there is too little easy and complete access to information on pest and disease management. Effort is to combine different datasets and allow farmers to search for a crop/pest/disease/weed in their local language and find related and practical content.

With over 17,000 species of plant pathogens and weeds in our database (and 200,000 plus synonyms in countless languages), I was looking for a way of automatically pulling in/showing images. Am trying the API approach thanks to a suggestion by user Pisum here.