Data users— what are your use cases and requests for exporting data?

i think it would be nice to be able to see multiple file names if there are multiple photos or sounds associated with the observation. i’m not a scientist. so this would just be mostly for personal record-keeping purposes.

it might be nice to be able to download identifiers (also showing multiple, if there are multiple).

i’d also like to be able to group / aggregate and just get summaries (ex. counts by species / month, max / min observation date by species, etc.). i’d probably use something like that if, say, i wanted to travel somewhere and wanted to get a sense of what i might be able to see at different times of the year. (off topic, but you guys should partner with a travel site to show what kind of interesting things have been observed at various destinations.)

8 Likes

I have used iNaturalist for an ArcGIS project for school

2 Likes

really, i’d love to be able to run SQL queries against the tables (or a relatively recent archive). i saw that you guys tried to put a dataset out to data.world, but it looks like there may be technical limitations to that platform.

1 Like

On the site, you’re able to see the number of observations that are classified as ‘introduced’ but the export table of observations does not have that column (or not that I’ve found).

I’ve been looking at more meta questions (what makes a good observer, how much does the diversity of observations increase with the additions of collectors, days, area… so on)

7 Likes

I don’t want to suggest that GBIF has any interest in poaching iNat users away from the iNat sat to access iNat data, but GBIF just implemented this feature as a third download option. See this discussion item: https://discourse.gbif.org/t/new-feature-download-lists-of-distinct-species-contained-in-occurrence-searches/687

One could easily define search parameters for geography—as well as dataset and dates—and just download the species list from something like this (quick-and-dirty, mind you): https://www.gbif.org/occurrence/search?dataset_key=50c9509d-22c7-4a22-a47d-8c48425ef4a7&has_geospatial_issue=false&geometry=POLYGON((-122.64313%2037.9182,-122.51129%2037.40507,-122.25861%2037.37889,-121.97296%2037.26531,-121.90704%2037.44434,-122.22015%2038.06539,-122.47833%2038.19502,-122.64313%2037.9182)).

You can also easily go back to the download DOI page and update the query results as needed.

3 Likes

One thing I really, really would like is the ability to download the identifications of particular users (i.e., if there is an expert in a certain group that I know is credible, I would probably take their ID over the community ID in any project that I wanted to use the data for).

In my mind, this would essentially solve the ID quality problem for me or at least give an important workaround. For example, anytime I needed Euphorbia data, I could use my ID or the ID of one of the other Euphorbia experts on iNaturalist and be confident that the data I was using was expert reviewed and actually research grade. This doesn’t solve the ID quality problem for GBIF, but would be a major step towards making the data useful.

7 Likes

Can you describe the ideal format in which you want to download that data?

1 Like

Thanks for sharing GBIF’s species list download functionality, @kcopas. That’s great to know and very useful as an example for us and possible option for some iNat data users. Please don’t be concerned about referring iNat users to GBIF for data—I’ve been thinking we should explicitly encourage it for the benefit of the DOIs and subsequent citation tracking that are superior to our haphazard list of data uses that aren’t captured by GBIF. I should add it to the FAQs.

3 Likes

Metadata would be nice. The mouse-over explanations are nice on the export page but they don’t give full definition of the fields or values in the fields. I think I can figure out most but what if I’m wrong? Geoprivacy="" means “Open”, I guess. License=“CC-BY-NC-SA” means “Attribution-NonCommercial-ShareAlike” had to dig into my Account Settings to find that one. Guessed wrong on Position_Accuracy until I checked website.

Other than that, export seems to work well enough.

4 Likes

Probably in the same format to the names as already provided, though I probably wouldn’t need much more than the scientific name. Perhaps under the heading “[user name]_id_scientific_name”? Higher level taxa fields like family would be great but can be worked around as long as the scientific name is given. I imagine it could get pretty complicated to add any more than a few fields for the specific user ID, but any of the fields under “taxon extras” would be useful and could ultimately save time.

While on the subject, subgenus and section are taxa I use a lot. If those could be added, it would also be very helpful but is a much lower priority for me than what I describe above, which actually adds functionallity.

1 Like

@nathantaylor do you imagine specifying a user (or small number of users) and downloading only observations that they have identified (with additional relevant filters) with their IDs as the scientific name? Or a different approach?

2 Likes

Thanks for these suggestions @tallastro and @alexis18. I think/hope they’re pretty straightforward additions.

2 Likes

A couple more things - it would be great to be able to choose coordinate formats (at least add UTM as an option), and to include an option to add DEM elevation data to records.

@carrieseltzer that would be very useful, and if that were a feature, I probably wouldn’t use it any other way. Also, I doubt anyone who curates the observations in their area of interest would use anything else either.

This is a bit beyond the initial request, but there are probably many circumstances where there are projects with datasets so large that the observations can’t be curated by one or even a few people. Under these circumstances you could use fields showing the community ID, list the users supporting a community ID, the maverick ID (there’s rarely more than one, or maybe the most recent maverick ID if it becomes an issue?), and list the users supporting the maverick ID. From that data, you could use conditional formatting highlight cells based on fields that contain the usernames of known experts. You could probably even sort with formulas and delete data that wasn’t curated. You could even do some quality control if there are known bad actors in the dataset. Again, a bit beyond the point, but could be a useful way of managing the data.

5 Likes

Thanks, @nathantaylor. This is not at all beyond the initial request and is a helpful suggestion for approaching the request to offer more insight about individual identifiers in the downloads at scale. Definitely interested to hear if others would find this useful or suggest other approaches to this problem.

2 Likes

I sometimes use iNaturalist as a way to collect better data more efficiently for herbarium specimens. When I download records, they almost always are part of a traditional project for which I’ve set up observation fields that map to the typical fields we have in an herbarium database. [Aside: I’ve been sad not to see custom observation fields integrated anywhere in the new projects model, although I understand why doing so wouldn’t work under the current design for collection projects.]

One problem I have encountered (and that I think still exists, although I’m not 100% sure) is that you can only download custom observation fields that you have used. This pops up when I work with botany instructors who want to use the workflow I set up to have their students collect specimens that will later be deposited into their herbarium; the instructors are often familiar with iNaturalist but may not make an observation within our course project and then when they go to download their students’ specimen data they do not have the option to download all of the observation fields. The solution, of course, is to have the instructors actually use the project and custom fields themselves. Still, though, it would be nice if observation fields that are set up to use within a project could always be available to download from that project.

Otherwise data exporting from iNat is great and does what I need! I like most of the other improvements proposed here, but probably wouldn’t take advantage of them myself.

2 Likes

@nathantaylor and @carrieseltzer This was my thought for a request as well!

Some of my uses line up fairly well with Nathan’s. For instance, in taxa that I work with a lot (like anoles), it would be really useful to have a field that has my ID for the observation. When I get undergrads to work with anole data from iNat, I just have them add a field to their csv from iNat for my ID and then they add info that manually, but this does take some time. This is nice to have for projects because we can say that all IDs were verified by one person.

I also agree that it would be really interesting from a more sociological perspective to see the IDs that individual users add. You can get a little of this now by seeing the total agreeing and disagreeing IDs on an observation, but a finer grain of detail would be nice. For instance, we’ve often thought about trying to quantify what anoles are harder to ID if we had ID history for observations, but we can’t do that with the data we can currently access (other than to see average correct IDs using the num_identification_agreements and disagreements fields). As a specific example, we thought it might be cool to try to quantify whether participants in one of our projects (Lizards on the Loose: https://www.inaturalist.org/projects/lizards-on-the-loose-2018) improve their IDing accuracy over time, but we can’t figure out an easy way to do this with the current data export. I did do this manually for a subset of observations, but coding it by looking at each observation individually was a lot of effort!

One way I could think to do this is to just give fields with each identifier’s user_id (already included for uploader) and their taxon ID (like 116461 for Anolis sagrei). This would make it easy to incorporate user IDs as effects in statistical models and the taxon ID would be interpretable as well. Anyways, this info would be cool to have!

Are there any plans to make the observation fields easier to select prior to downloading? Perhaps searchable fields? Maybe project downloads listing all project observation fields as talked about here: https://groups.google.com/forum/#!msg/inaturalist/wvhhKG1vTZ8/NchsXLWw73MJ

They really are difficult to deal with as they are now.

This is really minor, but does the observations export CSV have to be zipped?

I would like to be able to easily export the taxonomy, i.e. all taxa in iNat, or in a certain clade, whether or not the taxa have observations.

On lists, the CSV and Taxonomic CSV are still missing taxon_id. The taxonomic CSV and the observations export CSV are still missing several ranks.

3 Likes

Thank you for asking! So many potential uses, but currently three active or anticipated for me:

  1. (personal) Exporting my own observation data, so I can join selected information to a master photo data table. In particular, I capture the observation number, so it is easy to generate an observation URL for any photo I upload, and see which ones have not been uploaded yet. (I include unique photo IDs in the tags of each observation at upload.)

For this the current CSV export works fine, I just have to do a little manipulation with the exported data.

  1. (work related) Exporting documentation of rare species observations in my jurisdiction (Nevada, USA). For this, in addition to the existing CSV functionality, it would be really nice to be able to “export as PDF” a single observation record (or small set), and have all of the related observation data and metadata presented in a compact but readable format. Option to include only the first photo, or append any additional photos at the end of the PDF. Existing screenshot or browser printing options just aren’t cutting it for our needs.

As a work-around we could build a “mail merge” from exported CSV data, but this wouldn’t capture the evidence (photos).

  1. (near future) As part of a long-term project to revise the vascular flora of the White Mountains of California and Nevada (home of 5000+ year old bristlecone pine trees, and about 1100 other vascular plant species), I have been monitoring and curating the plant observations from that area. Eventually I’ll want to export all the relevant observations for further data analysis, and for this I will echo the previous suggestions in this thread that ability to include a specific user’s identification taxon for each observation (my own IDs, in this case) would be extremely important, and a likely need for many other use cases.

Since I am also querying specimen databases, one could argue that GBIF might be a better data source for this use case. But I will also want to capture observations that might not have made it into GBIF for whatever reason.

1 Like