Provide a list of things researcher should think about when downloading iNaturalist data

Platform(s), such as mobile, website, API, other: all platforms from which data can be downloaded

URLs (aka web addresses) of any pages, if relevant: [not sure I understand; any pages could be relevant]

Description of need:
Some people seem to be anxiously clutching their pearls as they try to protect researchers from data the researchers may want to see by shifting that data to Casual. I think that’s bad. Shouldn’t researchers be able to make their own choices? Also, people argue on the Forum about what to to protect the researchers from. I’m tired of seeing that. Finally, iNaturalist data does actually have some problems that researchers should consider and there’s evidence that sometimes researchers don’t deal with these problems, at least the first time they use the data.

Feature request details:
Need: iNaturalist data are great and useful for many kinds of research. They’re not perfect. I think that warning researchers about the problems would be useful for researchers. It would also be useful as an alternative to the actions of some people who shift observations to the cesspit that is Casual just because the data don’t meet their own standards, even though they would meet some researchers’ needs. Additionally, presenting such a list would allow us to short-circuit some endless, rancorous discussions on the Forum. We could just point people to this and ask them to improve it.

Proposed solution: Have a pop-up that shows up when someone downloads data saying something like, “Data from iNaturalist are great but not always perfect. Would you like to learn about problems you should watch for when using this data?” Include on the pop-up a place to click on “Don’t show me this again.” (But maybe make it show up once a year anyway.)

What should such a list include? I recommend the following, but no doubt you could improve this list. (Maybe the items could include links to iNaturalist descriptions or explanations of the problems.)

“Even RG observations may not be correctly identified; we recommend you sample the observations for check rate of error and especially check geographic and temporal outliers. The percent of accurate identifications is often high (over 90%, in many cases over 95%) but is dismal for other species.

Captive/cultivated organisms may not be marked as such. (Please mark those you notice as “captive/cultivated” or “not wild.”)

Observations with geoprivacy set to “obscure” are assigned by iNaturalist to a location within a latilong; their locations as presented are not accurate, though within a few miles of the true location.

Some observations have huge error circles around the reported location. The largest circles result from errors in data entry; in these cases, the locations is reasonably close to the reported location. Other large circles result from writing down a location that’s the center of a lake when the observation was on the shore or using park headquarters as the location for anything in a park. The post office location may be used for anything in a town. A large circle may mean “Seen somewhere along this trail.”

Fraud is rare and we try to keep it out, but we can’t entirely. It seems most common in observations posted by students using iNaturalist a graded assignment or in a few problem projects where the CNC and GSB are treated as contests to be “won.” Fraud using AI pictures also exists. (If you find fraudulent observations, please flag them.)

iNaturalist data are inevitably biased by the interests, skills, and distribution of the volunteer citizen-scientists who post these data and who identify them.

So . . . Do you think presenting such a list on download would be useful? What do you think such a list should include?

Rather than a pop-up, another possibilty would be to include it as some sort of Read Me file bundled with an export. We already include a list of metadata definitions with each export.

16 Likes

there are pages on the site specifically for (code) developer guidance. seems like there should be similar guidance specifically for researchers. so then whenever / wherever you want to provide guidance to researchers, then you can point to such a page that contains researcher guidance.

7 Likes

Goodness, thank you for writing this up much better than I could have.

Perhaps a useful way to get the ball rolling w/r/t the feature request, and a lower bar than a popup on download, would be a easily findable page (maybe in the help or faq section) where researchers who have used iNat data can share a brief summary of what limitations they faced, and how they addressed them. After some people have shared, then one could look for truly common issues that (1) could be summarized on data download, and (2) perhaps more importantly could be pointed to when people get anxious about what a hypothetical researcher may hypothetically need.

7 Likes

If there were an important “read me” file, I think it would be important to note what that contains when people download it. I almost never look at “read me” files as they rarely have anything important to me in them, at least the ones I’ve looked at.

I think a researcher guidance page could be very useful. The biggest issue I’ve seen is that some people don’t understand that many locations are false due to geoprivacy and treat them as true. So, details on what location fields are in a download and how to deal with them would be good.

The second big issue is researchers may not understand the reality of the data for “research grade” vs. “needs ID” vs. casual. Many “research grade” observations are misidentified. Many “needs ID” observations are solidly IDed. Many observations should be marked casual and aren’t. Some very important observations are marked casual for a variety of reasons and shouldn’t be, often automarked by iNat due to an initial misID. Generally, a researcher should work their way through IDs on iNat and improve them before downloading. The same applies to annotations and observation fields. Researchers should be made aware that it might be good to put a lot of work in before downloading the data and a list of some things to possibly QC/annotate on the site itself would be useful for some. It just depends on what they are trying to do and what limitations in the data they are willing to accept.

5 Likes

Don’t forget to vote for your own feature request!

6 Likes

This feature request as originally posed is excellent. For what it’s worth, I added my vote. I don’t believe relegating this to documentation without a pop-up (as described) will have the desired effect. For the most part, people don’t read documentation.

4 Likes

I think the combination of a page (for those actively looking for that information) and a pop-up (for those who don’t think to look) would be ideal. If the only place the information lives is in the download pop-up, then a lot of folks won’t know where to find it. Plus ideally researchers would help verify/confirm the observations (ID correctness, wild vs cultivated, etc) before they even try to download the data. I also agree that having a pop-up is critical, though- plenty of researchers may not think to learn about pitfalls/best practices ahead of time. It can’t hurt to have information in multiple places, as long as the info is consistent.

5 Likes

And the immortal “iNaturalist is designed to record individual people’s observations of individual organisms, which makes it very poorly suited to analyzing the abundance of organisms. If you as a researcher want to do something related to abundance or presence/absence, the onus is on you to filter your data to account for multiple observations of the same individual by different people, etc.”

I realize this is (hopefully) obvious to most researchers using iNaturalist these days. But it is one of the things people most frequently clutch their pearls about (including me when I first joined). And I once ran into another researcher running a state-level collection project who was trying to tell an observer that they should only upload conspecific individuals in a way that was consistent with the researcher’s particular vision of what he wanted his collection project data to look like. (I think it was 1 individual per species per county per day. At any rate, definitely not what the actual iNaturalist guidance is.) That instance makes me think it might be good to have a researcher page or wiki that researchers would hopefully look at when starting to work on iNaturalist and not just when they get around to downloading data.

6 Likes

Is there a methods paper, in the peer-reiewed literature, on using iNat data? There should be. And in my experience researchers would be much more likely to take seriously something like a paper in Methods in Ecology and Evolution than a pop-up on the website.

3 Likes

You’ll be glad to know that exactly such a paper is about to be submitted for review (I’m a co-author)

22 Likes

People tend to forget that iNat data is often showing more about how people interact with the landscape and what organisms catch their eye than it tells about what’s actually in an area and its distribution.

2 Likes

I’ve downloaded and used iNat data in quantity for a couple of publications now (with another one in press), so I can speak from personal experience that all of the concerns you list for a pop-up menu are already self-evident to this researcher. They needed no special emphasis or a reminder to me. Perhaps there will be newer/younger/less experienced researchers (read: I’m old and haggard…) who might need such reminders, but the limitations of iNat data seem pretty evident to me for anyone who has been on the platform for a time.

Despite the recited disadvantages of a “Read Me” file, I nonetheless think that @tiwane’s suggestion might be the way to go, providing it is put up front at the time data is downloaded for use.

3 Likes

Well, not all people wishing to use iNat data are necessarily going to be experienced iNat users.

I suppose people downloading data from iNat directly are more likely to be familiar with iNat than people downloading iNat data from other sources (e.g. GBIF), but there is no guarantee that they will have used iNat enough to understand its idiosyncracies and limitations.

I would advocate putting this information in a variety of places – not just displaying it at the time of downloading data, but also somewhere on the website, maybe as part of the Help pages.

4 Likes

I think a big part of this proposal isn’t so much making sure the researchers know as making sure that the non-researchers know that the researchers know (or should), so that they don’t have to keep worrying about “protecting” us/the data from various known issues

3 Likes

Actually, I did forget. Just did it now.

Excellent addition!

Good. Most of us who work with iNaturalist have encountered or read about most or all of these problems, but some researchers appear not to know and certainly many people concerned about data quality here don’t know that we know and don’t think other people might know. So I think something is needed.

I agree.

Having a page somewhere with a list of possible problems and explanations including ways to deal with these problems is a great idea. But as pointed out above, people don’t read such things. I think that at a minimum we need a pop-up (or equally conspicuous/annoying thing) at the time of download. It could just say there are problems and point to the longer page, though I’d prefer a list of common none problems right there.

4 Likes

A pop-up, or some other readme/message, to warn iNat users of the limitations of iNat data, displayed when downloading iNat data on the iNat platform… is a nice-to-have addition.

However, as already mentioned, not every researcher will be a registered iNat user downloading iNat data directly on the iNat platform.

Some will access, browse and eventually download iNaturalist-sourced data on GBIF, sometimes mixed with data points from other sources good and bad.

GBIF does a decent job of warning about obscured coordinates (e.g. “Coordinate uncertainty increased to 26935m at the request of the observer”).

Still, it does not provide a definition for “iNaturalist quality grades: Research”!

If not an iNat user and/or familiar with the iNat jargon, the meticulous researcher would probably want to visit the iNaturalistOrg website to try and understand what “Research grade” means. If I remember correctly, at least one scientific paper managed to get it very wrong… so it may not be as straightforward as it is to us all. :grinning_face_with_smiling_eyes:

The way I see things (but I’m neither staff nor UI designer): the front page of the iNatOrg website should try and accommodate directly this (tiny) fraction of legit visitors. Believe it or not, they may not come here to ‘Donate’, ‘Learn About Nature’, ‘Record Observations’, Sign Up’ as new ‘citizen scientists’… but rather to understand how the platform works when generating data for researchers.

Why not a clear, explicit link – I don’t know, something like “iNat For Researchers” – featured prominently on the front page? (…and not, or not only, buried under ‘Help’ menus)

It would redirect to a concise page intended deliberately and only for data consumers and researchers. Defining key principles and terms (“Research-grade only means that…”, “Obscured coordinates are…”; “Who are the Identifiers?”, “What is Computer Vision and when is it used?” “How to give due credit for data, and pics?” “How to report, or fix by yourself, errors in the dataset” “The taxonomic frameworks we adhere to is… except…”). Showcasing a few real-life use cases. Listing potential pitfalls, past misunderstandings, unsuitable usage, known shortcomings etc. Linking to previous in-house quality assessments, and to a few scientific articles reviewing iNaturalist impact and/or data.

8 Likes

While I think providing more information for researchers is good, I did want to address this part of the request:

I’m not sure that providing more information about the limitations and biases of iNat data would necessarily reduce the prevalence of this behavior. People who have strong feelings about what is “good” data and which data points should be shared with GBIF (particularly around questions like records of hitchhikers and escapees, duplicates, etc.) will continue to hold these feelings, regardless of whether iNat has documentation about what its data does or does not represent. They will likely continue to believe that they are doing other potential data users a service because they are keeping the data “clean” (according to their particular standards), because they cannot imagine other situations where people might want to see these data points.

I don’t know if there is a good solution to this. Having documentation to point people to might help somewhat, but I suspect to satisfy these people iNat might need to provide ways to mark and exclude certain types of observations via filtering – i.e., still give them a button they can click, even if that button does not make observations casual.

1 Like

Unfortunately, you are right. I’m hoping that people will become aware that researchers are being warned and realize we don’t have to do more to limit the observations that researchers see. At least we could point this out when the topic comes up in the Forum and short circuit certain discussions. Sigh.

3 Likes

I’m really not understanding what the problem is here. I can’t say that I’ve spent any time looking at Casual observations to see if I’m missing anything, but considering the significant percentage of RG observations which are (IMO) ‘junk’, I can’t image that there is much in the casual category that I need to be concerned about. I’d like many more observations to end up in the casual category, but I only send them there in the more egregious cases. Maybe this is happening with taxa that I’m not looking at.

Yes, that might help. As discussed in another thread, something like an “ignore” button that works similar to the “reviewed” button would help my work flow a great deal.

2 Likes