Taxa Differences in Time to ID?

Just a not-very-important question that popped into my head while doing some IDing this morning:

What are the differences in taxa for average time to ID their observations? I’m sure these differences do exist, just from my own anecdotal experience. Plant and insects often take a while, birds are often IDed almost immediately. I’m sure this is a function of both the number of IDers with expertise in different taxa as well as the ease with which those taxa can be ided from photos and many other things.

But, just out of curiosity, does anyone know of any data out there about how long it takes for observations to be IDed depending on what taxon they are in?

1 Like

I could’ve sworn that one of the blog posts in the last couple of months compared time to ID by taxon. I thought it was the 300,000 species post (https://www.inaturalist.org/blog/42626-we-passed-300-000-species-observed-on-inaturalist#description) or the 50 million observation post (https://www.inaturalist.org/blog/40699-50-million-observations-on-inaturalist) but it doesn’t seem to be on either of those. I’ll post it if I find it or maybe someone else can remember which one it was.

6 Likes

Fungi take even longer IMO

2 Likes

I agree, fungi seem to take the longest and birds are ID’d the fastest.

2 Likes

In lieu of something more exact, a quick and dirty way is just to see what percent of observations submitted in, say, October last year are RG versus Needs ID today (~1 year):

Birds: 151,342 RG of 160,228 Verifiable (94.5%)
Fungi: 33,782 RG of 119,517 Verifiable (28.3%)
Plants: 256,505 RG of 462,850 Verifiable (55.4%)
Insects: 176,300 RG of 316,962 Verifiable (55.6%)
Mammals: 26,372 RG of 30,086 Verifiable (87.7%)
Annelid worms: 286 RG of 1,750 Verifiable (16.3%)
Diatoms: 15 RG of 326 Verifiable (4.6%)

13 Likes

Yes, I took a look at this a little, and I think it’s a good first pass at addressing the issue!

But I also know some groups are really hard to ID at all from pics, so this method doesn’t separate the time to ID actually IDable observations from those that will never be IDed as it isn’t possible. It would be cool to have like a survival curve or something for IDs of different groups, but not sure how to make that from the data that I know how to get.

Also, hadn’t looked at diatoms! Poor diatoms…

3 Likes

generally, there’s not really a good way to separate out “things that will never be IDed because it isn’t possible”. there is the “it’s as good as it can be” flag, but the flag is not often used.

if you try to calculate the time to ID by taxon, then you’ll exclude things that things that have not been IDed to the taxon yet because they’re stuck at unknown, at parent taxa, or at the wrong taxa. so there’s a little bit of a chicken / egg problem here.

if you don’t mind excluding the stuff that hasn’t been IDed to the taxon, then it’s possible to calculate time to ID (to a particular rank) using the API, but you’d have to get observation-level details to do this, which means that you’d run into a maximum limit of 10,000 observations in your chosen dataset, which means that there wouldn’t be an easy way to get figures for taxa with, say, millions of observations like all birds. (the staff might have a way to do this more efficiently though.)

so i think if you want to get a sense of how well taxa are getting IDed (especially for high-level taxa), then what hanly described is the generally way to do it (unless the staff can run the numbers for you). i made something back in the day that gathers such data for each of the iconic taxa defined in the system, and it may help to quickly gather data for different sets of parameters. for example, this will get the figures for observations submitted in October 2019 (which is the set used in hanly’s figures above): https://jumear.github.io/stirfry/iNat_obs_counts_by_iconic_taxa.html?created_d1=2019-10-01&created_d2=2019-10-31.

you might be able to run several variants of the thing above and merge the results together to get some sort of time series. for example here are some figures for research grade vs verifiable for different sets of observations:

All Obs Obs Created 10/2018 Obs Created 10/2019 Obs Created 10/2020
Mammals 85.5 89.0 87.7 81.3
Birds 94.6 96.0 94.5 91.5
Reptiles 91.1 92.3 91.4 86.5
Amphibians 82.2 86.2 82.7 74.7
Ray-Finned Fish 79.0 79.8 78.5 67.9
Mollusks 57.7 56.8 56.0 51.5
Insects 54.2 59.3 55.6 47.9
Arachnids 33.8 39.5 35.6 30.2
Other Animals 44.3 45.1 40.3 37.6
Plants 55.9 59.3 55.4 43.1
Fungi 27.6 28.1 28.3 21.3
Chromista 50.3 41.6 43.3 50.6
Protozoa 26.3 38.2 39.1 17.2
Unknown 1.2 15.2 3.2 0.5
All 60.0 62.7 59.0 49.2
4 Likes

Yes, I agree that there’s definitely an issue with knowing what the “final” ID of any given observation that isn’t at RG yet.

I guess one way you could do this would be to take a set of observations that have been on iNat for some arbitrarily long length of time which would allow most observations that are able to be IDed to be IDed. Let’s say, 3 years or so. At this point, you could say with reasonable certainty that most observations that haven’t been IDed won’t be (though of course there would be some exceptions, especially if there are observations with lots of conflicting IDs or something). You could then exclude those “unIDable” observations from your dataset and retain those that have been IDed.

For each taxon you could then make a survival/rarefaction curve or something like that showing the proportion of IDable observations that have been IDed at any point in time after they were posted.

I think the time series that you posted gets at this idea pretty well…thanks! You can definitely see that for most taxa there are small gains after a year…generally only 1-5 more % getting IDed. There also seems to be a relationship between the IDableness and the speed of IDing. So observation in taxa in which a larger proportion of observations are ultimately IDed also get IDed more quickly. Interestingly, that pattern doesn’t really hold for the least IDed groups (like Fungi, Chromista, Protozoa) which still do top out at about a year.

Thanks!

1 Like

Which averages out to:
Animals: 354,300 RG of 509,026 Verifiable (69.6%)
Plants: 256,505 RG of 462,850 Verifiable (55.4%)
Fungi: 33,782 RG of 119,517 Verifiable (28.3%)

Sorry, but it bothers me that people conceptualize plants as the whole kingdom, but animals as separate phyla or even classes. Kingdom Plantae has far more diversity than Phylum Chordata.

3 Likes

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.