Data users— what are your use cases and requests for exporting data?

Thanks, @nathantaylor. This is not at all beyond the initial request and is a helpful suggestion for approaching the request to offer more insight about individual identifiers in the downloads at scale. Definitely interested to hear if others would find this useful or suggest other approaches to this problem.

2 Likes

I sometimes use iNaturalist as a way to collect better data more efficiently for herbarium specimens. When I download records, they almost always are part of a traditional project for which I’ve set up observation fields that map to the typical fields we have in an herbarium database. [Aside: I’ve been sad not to see custom observation fields integrated anywhere in the new projects model, although I understand why doing so wouldn’t work under the current design for collection projects.]

One problem I have encountered (and that I think still exists, although I’m not 100% sure) is that you can only download custom observation fields that you have used. This pops up when I work with botany instructors who want to use the workflow I set up to have their students collect specimens that will later be deposited into their herbarium; the instructors are often familiar with iNaturalist but may not make an observation within our course project and then when they go to download their students’ specimen data they do not have the option to download all of the observation fields. The solution, of course, is to have the instructors actually use the project and custom fields themselves. Still, though, it would be nice if observation fields that are set up to use within a project could always be available to download from that project.

Otherwise data exporting from iNat is great and does what I need! I like most of the other improvements proposed here, but probably wouldn’t take advantage of them myself.

2 Likes

@nathantaylor and @carrieseltzer This was my thought for a request as well!

Some of my uses line up fairly well with Nathan’s. For instance, in taxa that I work with a lot (like anoles), it would be really useful to have a field that has my ID for the observation. When I get undergrads to work with anole data from iNat, I just have them add a field to their csv from iNat for my ID and then they add info that manually, but this does take some time. This is nice to have for projects because we can say that all IDs were verified by one person.

I also agree that it would be really interesting from a more sociological perspective to see the IDs that individual users add. You can get a little of this now by seeing the total agreeing and disagreeing IDs on an observation, but a finer grain of detail would be nice. For instance, we’ve often thought about trying to quantify what anoles are harder to ID if we had ID history for observations, but we can’t do that with the data we can currently access (other than to see average correct IDs using the num_identification_agreements and disagreements fields). As a specific example, we thought it might be cool to try to quantify whether participants in one of our projects (Lizards on the Loose: https://www.inaturalist.org/projects/lizards-on-the-loose-2018) improve their IDing accuracy over time, but we can’t figure out an easy way to do this with the current data export. I did do this manually for a subset of observations, but coding it by looking at each observation individually was a lot of effort!

One way I could think to do this is to just give fields with each identifier’s user_id (already included for uploader) and their taxon ID (like 116461 for Anolis sagrei). This would make it easy to incorporate user IDs as effects in statistical models and the taxon ID would be interpretable as well. Anyways, this info would be cool to have!

Are there any plans to make the observation fields easier to select prior to downloading? Perhaps searchable fields? Maybe project downloads listing all project observation fields as talked about here: https://groups.google.com/forum/#!msg/inaturalist/wvhhKG1vTZ8/NchsXLWw73MJ

They really are difficult to deal with as they are now.

This is really minor, but does the observations export CSV have to be zipped?

I would like to be able to easily export the taxonomy, i.e. all taxa in iNat, or in a certain clade, whether or not the taxa have observations.

On lists, the CSV and Taxonomic CSV are still missing taxon_id. The taxonomic CSV and the observations export CSV are still missing several ranks.

3 Likes

Thank you for asking! So many potential uses, but currently three active or anticipated for me:

  1. (personal) Exporting my own observation data, so I can join selected information to a master photo data table. In particular, I capture the observation number, so it is easy to generate an observation URL for any photo I upload, and see which ones have not been uploaded yet. (I include unique photo IDs in the tags of each observation at upload.)

For this the current CSV export works fine, I just have to do a little manipulation with the exported data.

  1. (work related) Exporting documentation of rare species observations in my jurisdiction (Nevada, USA). For this, in addition to the existing CSV functionality, it would be really nice to be able to “export as PDF” a single observation record (or small set), and have all of the related observation data and metadata presented in a compact but readable format. Option to include only the first photo, or append any additional photos at the end of the PDF. Existing screenshot or browser printing options just aren’t cutting it for our needs.

As a work-around we could build a “mail merge” from exported CSV data, but this wouldn’t capture the evidence (photos).

  1. (near future) As part of a long-term project to revise the vascular flora of the White Mountains of California and Nevada (home of 5000+ year old bristlecone pine trees, and about 1100 other vascular plant species), I have been monitoring and curating the plant observations from that area. Eventually I’ll want to export all the relevant observations for further data analysis, and for this I will echo the previous suggestions in this thread that ability to include a specific user’s identification taxon for each observation (my own IDs, in this case) would be extremely important, and a likely need for many other use cases.

Since I am also querying specimen databases, one could argue that GBIF might be a better data source for this use case. But I will also want to capture observations that might not have made it into GBIF for whatever reason.

1 Like

i’m not sure why you couldn’t pull in photos if you’re doing a mail merge. see: https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-mso_other/pulling-images-in-to-word-with-mail-merge/48d7421f-0649-e011-8dfc-68b599b31bf5

if you need to get all the photos downloaded onto your local machine, you could probably do something like this: https://forum.inaturalist.org/t/complete-observation-transfer-export-import/1678/4.

4 Likes

Thank you for the suggestions @pisum! We’ll definitely follow those leads if (as I suspect) a PDF export utility doesn’t seem to be on the horizon.

I’ve created ‘Places’ which are intertidal areas mapped using polygons. I export data from our project for baseline visualisation and creating locational checklists.

One big problem I currently face with exported data is with locations. Contributors use different names for them. For eg: ‘Juhu beach’, ‘Juhu Koliwada’ and ‘Juhu chowpatty’ are a few of the names used for the same shore. Locations of observations with geotagged photos are sometimes totally different. Same for observations with ‘Obscured’ locations. It would help to have a ‘Place’ column in the CSV so all those observations can be sorted using that column.

Also, having a way to download a species checklist would be great!

[I apologise if this is the wrong place for this, but having an option to add a ‘Place’ as a location in observations would be very helpful too.]

When downloading a larger data-set (ex. all of a certain place) it would be nice to also get the information in which projects the downloaded observations belong to. The information about the “collection” projects is currently not included in the metadata of the observations as far as I’m aware. I’m aware that collections work differently than traditional projects and that adding this information to each observation is probably an resource intensive process.

I did add a little more context in thsi post:
https://forum.inaturalist.org/t/find-out-to-which-projects-an-observation-has-been-added/3236

1 Like

It would be awesome to integrate with Daymet or with NLCD to get the land cover of the polygon/place for the observations being downloaded, or the weather occurring on the date of observation

1 Like

A post was split to a new topic: How to download taxa

I’ve been thinking on this issue of how to more easily capture usernames of identifiers in a concise and useful way in an export. Currently we have fields for num_identification_agreements and num_identification_disagreements. Without getting too deep in right now, I don’t think this is currently calculating as you might expect, but what if we fix the logic and essentially replaced these counts with lists of usernames?

New fields proposal:
Obs_taxon_exact_match
Shows usernames associated with identifications exactly matching the observation taxon (i.e. the taxon_id exported in the csv and displayed at the top of the observation detail page).
Obs_taxon_mismatch
Shows usernames associated with identifications that do not exactly match the observation taxon. This includes identifications of the observation taxon ancestors as well as maverick IDs.

Would this mostly satisfy your desires to search for observations identified by specific users, @nathantaylor @cthawley @jdmore ?

(Editing to add clarification that the “observation taxon” is the taxon currently associated with the observation, which may be ahead of the community identification (CID). We recently updated the images and text on this topic in Getting Started.)

2 Likes

@carrieseltzer This would definitely be an improvement, as it would allow us the ability to find data that is labeled incorrectly and manually change it, but we still would have to manually change the data by going back and forth from the datasheet to the iNaturalist observation itself. This isn’t a big deal for small datasets, but for large datasets, I can see this being a challenge. I’d have to run some tests with the new system to see just how much of a time sink this is, but it would be an improvement.

If it is too difficult to retrieve the ID associated with the obs_taxon_mismatch data, can an option be put in place to add a category for the ID of a particular user? Perhaps adding the option to add project curator IDs as a work around if necessary? Either would fix essentially all the problems I’ve run into, especially if in combination with what you’re proposing.

I don’t think it will alleviate all the future problems, as the IDs associated with the obs_taxon_mismatch data with the ID of the obs_taxon_exact_match data is really the only way to have a complete dataset to make determinations on what the name should actually be without constantly returning to the iNat website or downloading the data piecemeal (i.e., download taxa individually by including expert identifications of that particular taxon).

@carrieseltzer Thanks for getting back to us!
I think that this would be useful in some cases, though the potential discrepancy between Community ID and observation taxon worries me. I actually didn’t realize that the ID currently downloaded was the observation taxon and not the community ID (oops! I need to read more closely) which would be more valuable for some of my work. (Presumably I would get the Community ID if I download from GBIF?)

I think that the most useful solution for me would be that proposed by @nathantaylor to have an optional field where one could input a User ID and have a column in the database return the taxon ID that that User gave to each observation (with NAs if they didn’t ID it).

Another nice feature of this might be that, if accessible from the API, and in combination with the data from the proposed fields for Obs_taxon_exact_match and Obs_taxon_mismatch, one might be able to use the API to query the taxon of each user’s identification.

Thanks for looking into this!

1 Like

See also this relevant feature request: https://forum.inaturalist.org/t/search-by-observation-id-or-community-id/3620

1 Like

i’ve been experimenting with mapping/GIS stuff a bit (https://forum.inaturalist.org/t/which-gis-tools-do-you-use-and-why/3519/9), and i think i’m learning that if data could be exported to a (Geo?)JSON format, that might actually be nicer than a CSV in many cases. i’m thinking in particular that that would be a better structure for capturing things that could have M:1 relationships (like many identifications for a given observation). it looks like most GIS applications will load these without issue, and it might be an easier starting point for creating interactive maps from scratch than a CSV.

It would be great if we could export all the observations in a given area, such as the map search which I screenshotted. This would be super useful for mapping out biodiversity in a given area!

This is already possible. In your image above, you just need to click “Redo search in map”, then “Filters”, then “Download”

3 Likes

Hi,

I am developing a desktop database system written in Libre Office Basic aiming to distribute it freely among some public schools here in my town. The objective is to attrack kids to science through the use of iNat as a plataform that will hold up their observations within projects wich will be imported back into that database. It is already functional and I’m just making some final testing and debugging. I’ve read a lot in this thread and searched through the API carefully for a parameter or function that would let me import, through the iNat CSV export tool, only new observations or older ones that had any change (Ids, comments, faves, etc) after a certain date. I’ve found the ‘created_after’ (don’t know where) and ‘observation_created_d1’ (in the API). But couldn’t find an ‘updated_after’ or something alike. That could be very helpful for heavy users, even not being exactly my own case (almost 5K observations among projects and personal). For me, instead, it would reduce processing time to discard the unaltered records. Any help?

1 Like