Data users— what are your use cases and requests for exporting data?

#1

For those of you who use data from iNaturalist, what shortcomings do you find in the current csv (spreadsheet) export functionality and what is it preventing you from doing? Or phrased optimistically, what would you like to be able to do with exports from iNaturalist that you can’t currently?

If you are requesting additional data in the csv export that isn’t currently included (e.g. annotations), it will be most helpful if you can elaborate on the specific format in which you’d like to see it in a spreadsheet. It’s not as clearcut as to how annotations (particularly those that can have more than one response, such as plant phenology), so we’d like to hear how you’d like to use data and what format is most useful.

Not for this thread:

  • Although we know DOIs for iNat downloads is desirable, it is unlikely a feature we can implement soon (mentioned here) and recommend people download from GBIF to get a DOI.
  • Changes to annotations themselves. There’s a separate thread about annotations.
4 Likes
Use computer vision to annotate observations?
Downloading Annotations
#2

So I’m not a regular data downloader, but when I do download it’s often for communication & visualizations needs.

I’ll often download the data collected during an event like Snapshot Cal Coast so I can do a timelapse in Corto. So I’m usually downloading id, time_observed_at, latitude, longitude, and sometimes scientific_name and some subset of the taxon extras if I want to visualize by groups of taxa. In this case I do need all the observations as I want to visualize them each as points.

The other main reason I download data is because I’m trying to compare species lists - for example, after this year’s City Nature Challenge, I’m interested to know what new species we recorded here in the SF Bay Area. So usually I go about this by downloading all the observations made before the CNC (usually bare minimum of info: id, quality_grade, scientific_name, and taxon_id), and then all the observations made during the CNC, remove species duplicates from each list, and then compare the two lists.

It’s pretty simple, but it’s a lot of data… when if I could just download a species list, that would be way less. Because there are 1,000,000+ observations made in the Bay Area before this year’s CNC, but those observations represent 11k-ish species. But I currently have to download all 1,000,000 observations to figure out what those 11k species are. So it would be useful in this case to be able to download just a species list in a given timeframe from places.

Hope that’s helpful?

6 Likes
#3

Thanks, Alison. That is helpful. I hadn’t considered that particular use case. It is inefficient to download all that data just to derive a species list.

3 Likes
#4

the main reasons i have dowloaded data were with coordinates to add to a ArcGIS project. So if there were a way to generate an ArcGIS project that would save a few steps, but not a big deal since i only do it about once a year. I’ve also downloaded bioblitz data before.

I feel like the download process is a little buggy, like the email doesn’t always generate or it isn’t clear if it went through. other than that I can’t think of anything else.

1 Like
#5

Adding an option to download a species list would definitely be a big time saver for communication/visualization, as stated above. Other summary data would also be useful (number of observers? Breakdown of data quality?).

For geo-referenced data, I like the current csv format - it makes for an easy import to ArcGIS online. It would be great to download annotations (and make it more worthwhile to utilize those fields). I can see where the plant phenology will be tricky. Perhaps each option (flowering, budding, etc) can be included as a separate column. As a side note, it would be good to include vegetative, dormant, and dead options for phenology, or have some way to indicate if the observation was reviewed but did not exhibit any of the current options.

One more thing I sometime struggle with where accuracy is important is “cleaning” the data after a download. Some more intuitive filters only export data with a certain accuracy, private/obscured status, etc. would be great!

3 Likes
#6

My needs are simple, I use the data to track my personal observations - Birds/Odonata by year, month, tag, etc… or specific taxon within a state. It meets my needs.

1 Like
#7

i think it would be nice to be able to see multiple file names if there are multiple photos or sounds associated with the observation. i’m not a scientist. so this would just be mostly for personal record-keeping purposes.

it might be nice to be able to download identifiers (also showing multiple, if there are multiple).

i’d also like to be able to group / aggregate and just get summaries (ex. counts by species / month, max / min observation date by species, etc.). i’d probably use something like that if, say, i wanted to travel somewhere and wanted to get a sense of what i might be able to see at different times of the year. (off topic, but you guys should partner with a travel site to show what kind of interesting things have been observed at various destinations.)

6 Likes
#8

I have used iNaturalist for an ArcGIS project for school

2 Likes
#9

really, i’d love to be able to run SQL queries against the tables (or a relatively recent archive). i saw that you guys tried to put a dataset out to data.world, but it looks like there may be technical limitations to that platform.

1 Like
#10

On the site, you’re able to see the number of observations that are classified as ‘introduced’ but the export table of observations does not have that column (or not that I’ve found).

I’ve been looking at more meta questions (what makes a good observer, how much does the diversity of observations increase with the additions of collectors, days, area… so on)

6 Likes
#11

I don’t want to suggest that GBIF has any interest in poaching iNat users away from the iNat sat to access iNat data, but GBIF just implemented this feature as a third download option. See this discussion item: https://discourse.gbif.org/t/new-feature-download-lists-of-distinct-species-contained-in-occurrence-searches/687

One could easily define search parameters for geography—as well as dataset and dates—and just download the species list from something like this (quick-and-dirty, mind you): https://www.gbif.org/occurrence/search?dataset_key=50c9509d-22c7-4a22-a47d-8c48425ef4a7&has_geospatial_issue=false&geometry=POLYGON((-122.64313%2037.9182,-122.51129%2037.40507,-122.25861%2037.37889,-121.97296%2037.26531,-121.90704%2037.44434,-122.22015%2038.06539,-122.47833%2038.19502,-122.64313%2037.9182)).

You can also easily go back to the download DOI page and update the query results as needed.

3 Likes
#12

One thing I really, really would like is the ability to download the identifications of particular users (i.e., if there is an expert in a certain group that I know is credible, I would probably take their ID over the community ID in any project that I wanted to use the data for).

In my mind, this would essentially solve the ID quality problem for me or at least give an important workaround. For example, anytime I needed Euphorbia data, I could use my ID or the ID of one of the other Euphorbia experts on iNaturalist and be confident that the data I was using was expert reviewed and actually research grade. This doesn’t solve the ID quality problem for GBIF, but would be a major step towards making the data useful.

5 Likes
#13

Can you describe the ideal format in which you want to download that data?

1 Like
#14

Thanks for sharing GBIF’s species list download functionality, @kcopas. That’s great to know and very useful as an example for us and possible option for some iNat data users. Please don’t be concerned about referring iNat users to GBIF for data—I’ve been thinking we should explicitly encourage it for the benefit of the DOIs and subsequent citation tracking that are superior to our haphazard list of data uses that aren’t captured by GBIF. I should add it to the FAQs.

1 Like
#15

Metadata would be nice. The mouse-over explanations are nice on the export page but they don’t give full definition of the fields or values in the fields. I think I can figure out most but what if I’m wrong? Geoprivacy="" means “Open”, I guess. License=“CC-BY-NC-SA” means “Attribution-NonCommercial-ShareAlike” had to dig into my Account Settings to find that one. Guessed wrong on Position_Accuracy until I checked website.

Other than that, export seems to work well enough.

3 Likes
#16

Probably in the same format to the names as already provided, though I probably wouldn’t need much more than the scientific name. Perhaps under the heading “[user name]_id_scientific_name”? Higher level taxa fields like family would be great but can be worked around as long as the scientific name is given. I imagine it could get pretty complicated to add any more than a few fields for the specific user ID, but any of the fields under “taxon extras” would be useful and could ultimately save time.

While on the subject, subgenus and section are taxa I use a lot. If those could be added, it would also be very helpful but is a much lower priority for me than what I describe above, which actually adds functionallity.

1 Like
#17

@nathantaylor do you imagine specifying a user (or small number of users) and downloading only observations that they have identified (with additional relevant filters) with their IDs as the scientific name? Or a different approach?

2 Likes
#18

Thanks for these suggestions @tallastro and @alexis18. I think/hope they’re pretty straightforward additions.

2 Likes
#19

A couple more things - it would be great to be able to choose coordinate formats (at least add UTM as an option), and to include an option to add DEM elevation data to records.

#20

@carrieseltzer that would be very useful, and if that were a feature, I probably wouldn’t use it any other way. Also, I doubt anyone who curates the observations in their area of interest would use anything else either.

This is a bit beyond the initial request, but there are probably many circumstances where there are projects with datasets so large that the observations can’t be curated by one or even a few people. Under these circumstances you could use fields showing the community ID, list the users supporting a community ID, the maverick ID (there’s rarely more than one, or maybe the most recent maverick ID if it becomes an issue?), and list the users supporting the maverick ID. From that data, you could use conditional formatting highlight cells based on fields that contain the usernames of known experts. You could probably even sort with formulas and delete data that wasn’t curated. You could even do some quality control if there are known bad actors in the dataset. Again, a bit beyond the point, but could be a useful way of managing the data.

3 Likes