Snubbing iNat data

Hello iNatters!

I’m curious if other people have experienced colleagues and experts “snubbing” iNaturalist data? In my case what I mean is scientists who automatically think iNaturalist data is worthless, because it can be collected by anyone, not collected and ID-ed in a “traditional” way. Meaning: found in the field by an expert, collected, pressed [in case of a plant], keyed and then stored in a herbarium forever.

I use iNat data quite a lot in researching species distributions. Without going into too much detail, it’s remarkable to me how often colleagues reveal themselves as looking down on iNat. Demanding extra verification or just discarding it altogether. I am fully aware that there are some specific biases with iNat data and there are some really interesting research papers on the topic (for example this one). But researching it and figuring out biases seems to me the way to go, while dismissing the data seems distinctly unscientific.

Curious if others have similar experiences with this!

-Tammo

30 Likes

Yes, this bias is quite widespread. I worked for a community nonprofit where the Research Manager (a surly bureaucrat at heart) was very vocal about the idea that anything involving iNat was fake science. His house backed up onto a nature area I was organizing volunteers to take observations in, and when he saw us he would come out and yell through his fence, trying to invent reasons we were not allowed to observe along this public trail. Without really knowing anything about iNat he rejected it and all other community science as unreliable, unprofessional, and generally suspect.

It is true that for some taxa iNat data are very unreliable. For example there are taxa that the computer vision recommends all the time such that they are in the data in many places where they do not occur.

But there are many species for which iNat is the single best source of fine scale occurrence data. Want to know where the urban ground squirrels are? iNat data make that easy. Want to map the expanding range of Spotted Lanternfly in North America? iNat of course.

The species list for that nature area I mentioned above went from about 200 to 1500 confirmed species due to iNat.

33 Likes

What iNat lacks in perfection vs a traditional survey it makes up for in ridiculously large quantity, to the point where it would be impossible to collect it otherwise.

I was recently curious if the full moon affected how many snakes are active at night (because herpers always claim it does but I’ve never noticed it), so I pulled a bunch of iNat data and turns out, nope, it doesn’t affect it at all. Even with funding, doing that without iNat would take an absolutely enormous amount of time and effort.

28 Likes

In my opinion, iNat data is not perfect, and two IDs should not make “Research Grade“. The overall idea that the data created by iNat is not fit for studies is foolish because iNat has many uses, and should be vetted before use in studies. Interestingly enough, the idea that lichen obs can’t be complete without chemical tests of the substrate is something I have thought about several times; in my opinion, that point is valid, but poorly presented. Maybe an annotation for chemicals could be added? That would not assure people have access to the materials needed to perform a test, but it would be a step in the right direction.

Hope this sparks some thought,

Bryce

8 Likes

There was an old ad for Guinness - I’ve never tried it because I don’t like it.

Cannot help feeling that applies to some Ivory Tower scientists defending their Not Guinness / iNat.

https://forum.inaturalist.org/t/published-papers-that-use-inaturalist-data-wiki-4-2024-2025/47837/26

22 Likes

A blanket statement like “iNaturalist data is worthless” is small-minded and simply untrue. That said, there are certain types of questions (such as those involving abundance) for which iNaturalist data is not well-suited when used in the ways some researchers apply them. As with any dataset, it’s important to have a realistic understanding of both its value and its limitations.

30 Likes

Surface geological maps are available (for free in Australia). Mineralogical studies for the strata of any outcrop are also available together with dating. Unless trace elements of the substrate are needed, the chemical test is giant waste of effort.
I only needed to confirm soil composition twice, it took me about an hour each.

A blanket rejection of the data is a problem. That said, I think that folks need to vet the data they do plan to use bc of some challenges (like things being marked as wild when they aren’t) or checking that things have been ID’d properly. Which, any data set should be vetted in my opinion before using it….

9 Likes

There are certain patterns in iNat data that academic biologists can hear about and discount the rest as a result. Lots of cultivated plants, lots of CV misidentifications of their specialty taxon, out of date taxonomy with their specialty taxon, maybe concerns about geoprivacy, etc. If you have tunnel vision on those issues then the platform can look like a mess.

But iNat is extremely powerful, you just kind of get what you put into it. It’s like any piece of complicated technology though; to get the most out of it you need to learn what all the buttons do and how to use them together effectively. If the taxonomy or identifications on your taxon of interest are a mess, you can actually influence and improve that on iNat in a way that wouldn’t be possible in many other settings, and maybe find some notable records in the process. There are a lot of options for filtering observations if you want to be picky about them.

15 Likes

Some colleagues still make a big deal out of how “biased” iNaturalist data are. While it’s taken some time, most now use it for research but spend lots of time talking (and writing) about how biased it is (and still spend little to no time contributing to it). “Biased” strikes me as a strange, and often dismissive, description to apply to a dataset that was never designed to be unbiased. Of course there’s not an equal probability on iNat of observing any individual of any species anywhere on the planet. That’s just not what iNat is.

I much prefer to think of it as choice. Using iNat for ecological science requires acknowledging, and modelling, its observers choices. If we looked at all the wild mushroom species that people collect to eat, I’d say that people choose to collect large edible mushrooms, not that they’re biased against collecting poisonous mushrooms. Calling iNat “biased” seems like the wrong word to me when the core of what iNat data are is which species people choose to observe.

19 Likes

I’ve heard that kind of dismissal of iNaturalist data and it seriously annoys me. iNaturalist data is far from perfect, but it is great for many species in many areas, for answering many (but not all) questions. To find out if iNaturalist is good for your study, you must examine it. At least sample the data to see if misidentification or other problems are common, or not.

One issue some people have with the data is “it doesn’t have vouchers and we need vouchers.” Actually, photographs are vouchers – you can go back to them and see if they support the identification or not. It can take a while to get that idea into some people’s heads, but I have sometimes been persistent enough.

30 Likes

There are but they have acces to high protocol data..
if there is nothing you do not have the privilege to choose..
I still think it is a pity inaturalist does not stimukate to collect data with a protocol.

https://ndff.nl/

But if abundant data is not present/in remote areas iNat has a good contribution

Ugh…I have run into that attitude more than a few times, and it’s incredibly frustrating. Almost invariably, it comes from someone who

  • has never used iNat and has only the vaguest understanding of how it works (and generally assumes that the CV provides all the IDs);
  • has tried posting one iNat observation and had it misidentified or got in an argument about it (often because necessary ID features weren’t visible in their photo); or
  • is a casual iNat user but never takes the time to help with identifications.

If the complaint centers around misidentified observations, I always respond with “Well, you’ve got a great knowledge of [taxon]–you could really help with IDs and improve the quality of the iNat records!”

15 Likes

Any data collection will have the possibility of bad data. That’s because people (flawed by nature) are gathering it. People who contribute to iNaturalist provide the opportunity for researchers to have a lot of free help. And, there are a lot of knowledgeable people out there gathering data who may not have a formal education in a certain area, but they know their stuff.

Most important: Even if a person doesn’t have the knowledge of what he/she is taking a photo of, a clear photo shows the presence of an organism and provides good data. There’s a location, date and time attached to that photo.

I think the same thing applies to eBird. There are people who do it for reasons other than gathering data for research. (Don’t get me started on people blindly trusting the Merlin app!) But, I’m sure that researchers appreciate the data even if they know some of it won’t be good. And, the more data that gets gathered, the greater the chance for a good picture of what is going on out there.

10 Likes

Most scientists I talk to are less worried about bias (herbarium records are also very biased). What I hear the most complaints about are records not annotated as cultivated. This is a pretty unique problem to iNaturalist data that can be hard to deal with in some datasets. You also get weird outliers caused by misidentifications, but these are rampant in herbarium records too and not unique to iNaturalist.

18 Likes

Libraries are valueless, there are whole shelves of romance and fantasy books, how could you possibly learn there?

8 Likes

I did - learn - about a gnome with a chip on her shoulder.

1 Like

And they have the advantage of being of a live and fresh organism in most cases, rather than a pressed or otherwise preserved specimen! Color, shape and other characteristics are often lost in preservation, so in such cases photographs can be more powerful, not less.

It all depends on what your data needs are though, if you need DNA to sequence, the vast majority of iNat observations would be of no help compared to a physical voucher.

The interesting idea is whether we can start recruiting researchers to deposit their photographic vouchers on iNat as well.

I personally have gotten a few academics and former academics to get over their suspicions and they’re now great observers and adherents of the platform.

I think the main barriers are unfamiliarity, (misapplied) scientific skepticism, hearsay on the disadvantages and problems of iNat, and for some, the fear or avoidance of having the “unwashed masses/general public” having access to their observations and (horror!) the possibility of being corrected or not having the last say on their observation.

How iNat counters that is another discussion, maybe their “ambassadors” program is aimed at this in part?

5 Likes

To be fair it is certainly nowhere near the scale that you see on iNat, but you do get the same thing for herbarium voucher datasets too. For those who are unaware, within biodiversity databases there is a Darwin Core field for captive/cultivated records called degreeOfEstablishment. This is where you’re meant to fill in your cultivated value, and then in eg the Atlas of Living Australia (ALA), there is a data profile that automatically filters out these records from all default searches and maps, just like iNat does for Casual records. But of course this only works if the herbarium staff/volunteers digitising cultivated records actually fill in the correct Darwin Core field! Unfortunately this is not the case, and there are myriad other fields where the word/value ‘cultivated’ gets added instead, and so all of these records show up on maps exactly how unmarked cultivated iNat records do.

The following map from the ALA shows herbarium vouchers of the plant species Chamelaucium uncinatum, which occurs naturally only in Western Australia.

All of those records (except two naturalised plants) in the eastern half of the country are ‘unmarked’/incorrectly marked cultivated specimens. Some of them do not even explicitly mention the word ‘cultivated’ at all, simply stating the collection was made from a plant in a nursery or botanic gardens. Others do mention the word ‘cultivated’, but in the completely wrong field. For example, one record has filled out the ‘Locality’ field with the comment:

Cultivated by P. & A. Vaughn at Mt Cassel Plant Nursery, Pomonal.

Another one simply states, in the ‘Occurrence remarks’ field:

Large wide spreading shrub. Cultivated.

These are the equivalent of me uploading a cultivated plant to iNat and, instead of ticking not wild, just typing ‘cultivated’ into the location notes or the description.

Broadly, I am yet to find a single data issue, error, bias, etc that isn’t present in both iNat records and herbarium/museum records (again, the scale/magnitude is often quite different of course, but the point stands that these are all biodiversity data issues broadly, and not unique to a certain platform or data stream). Every data quality issue that iNat sceptics/critics ever mention has existed in collections data for decades or centuries before iNat was even conceived.

15 Likes

iNat data can be quite valuable. It just needs identifiers who taxonomically ‘clean up’ the observations.

I’ve found certain plants thanks to iNaturalist. Finding them would have been very exhausting. (Of course the plants weren’t harmed, it was more like a challenge to see them in real life for the first time).

Especially the distribution part is so useful and the most exciting one for me.

Yes, researchers also go on field trips and find new locations.

But they haven’t researched all corners of the earth and some species might remain unnoticed by them.

That’s when suddenly - in regions where you’re hoping a species to occur - a completely unexpected random observation outside the known range pops up and you feel like you won the biologist jackpot. These are the most pleasant moments here on iNat for me.

I have to say that I’ve learned so much about variability in morphology and kinda have developed a gut feeling about this (the learning process thanks to the observations here on iNat is incredible). And it is so convenient, e.g. identifying Mediterranean or Asian insects from my couch in Central Europe. That’s such a privilege to have this website & community and to live at this moment of time.

I really can’t understand people who criticise it or don’t acknowledge it. It might not replace the traditional specimen collections but hey, what about actually learning something about the morphology of the living organisms. What’s the reason for ignoring all the valuable data?

11 Likes