Taxa Differences in Time to ID?

cthawley · November 15, 2020, 4:20pm

Just a not-very-important question that popped into my head while doing some IDing this morning:

What are the differences in taxa for average time to ID their observations? I’m sure these differences do exist, just from my own anecdotal experience. Plant and insects often take a while, birds are often IDed almost immediately. I’m sure this is a function of both the number of IDers with expertise in different taxa as well as the ease with which those taxa can be ided from photos and many other things.

But, just out of curiosity, does anyone know of any data out there about how long it takes for observations to be IDed depending on what taxon they are in?

okbirdman · November 15, 2020, 4:53pm

I could’ve sworn that one of the blog posts in the last couple of months compared time to ID by taxon. I thought it was the 300,000 species post (https://www.inaturalist.org/blog/42626-we-passed-300-000-species-observed-on-inaturalist#description) or the 50 million observation post (https://www.inaturalist.org/blog/40699-50-million-observations-on-inaturalist) but it doesn’t seem to be on either of those. I’ll post it if I find it or maybe someone else can remember which one it was.

lappelbaum · November 15, 2020, 5:03pm

Fungi take even longer IMO

cgmayers · November 15, 2020, 7:19pm

I agree, fungi seem to take the longest and birds are ID’d the fastest.

hanly · November 15, 2020, 7:44pm

In lieu of something more exact, a quick and dirty way is just to see what percent of observations submitted in, say, October last year are RG versus Needs ID today (~1 year):

Birds: 151,342 RG of 160,228 Verifiable (94.5%)
Fungi: 33,782 RG of 119,517 Verifiable (28.3%)
Plants: 256,505 RG of 462,850 Verifiable (55.4%)
Insects: 176,300 RG of 316,962 Verifiable (55.6%)
Mammals: 26,372 RG of 30,086 Verifiable (87.7%)
Annelid worms: 286 RG of 1,750 Verifiable (16.3%)
Diatoms: 15 RG of 326 Verifiable (4.6%)

cthawley · November 15, 2020, 9:08pm

Yes, I took a look at this a little, and I think it’s a good first pass at addressing the issue!

But I also know some groups are really hard to ID at all from pics, so this method doesn’t separate the time to ID actually IDable observations from those that will never be IDed as it isn’t possible. It would be cool to have like a survival curve or something for IDs of different groups, but not sure how to make that from the data that I know how to get.

Also, hadn’t looked at diatoms! Poor diatoms…

pisum · November 15, 2020, 10:42pm

generally, there’s not really a good way to separate out “things that will never be IDed because it isn’t possible”. there is the “it’s as good as it can be” flag, but the flag is not often used.

if you try to calculate the time to ID by taxon, then you’ll exclude things that things that have not been IDed to the taxon yet because they’re stuck at unknown, at parent taxa, or at the wrong taxa. so there’s a little bit of a chicken / egg problem here.

if you don’t mind excluding the stuff that hasn’t been IDed to the taxon, then it’s possible to calculate time to ID (to a particular rank) using the API, but you’d have to get observation-level details to do this, which means that you’d run into a maximum limit of 10,000 observations in your chosen dataset, which means that there wouldn’t be an easy way to get figures for taxa with, say, millions of observations like all birds. (the staff might have a way to do this more efficiently though.)

so i think if you want to get a sense of how well taxa are getting IDed (especially for high-level taxa), then what hanly described is the generally way to do it (unless the staff can run the numbers for you). i made something back in the day that gathers such data for each of the iconic taxa defined in the system, and it may help to quickly gather data for different sets of parameters. for example, this will get the figures for observations submitted in October 2019 (which is the set used in hanly’s figures above): https://jumear.github.io/stirfry/iNat_obs_counts_by_iconic_taxa.html?created_d1=2019-10-01&created_d2=2019-10-31.

you might be able to run several variants of the thing above and merge the results together to get some sort of time series. for example here are some figures for research grade vs verifiable for different sets of observations:

	All Obs	Obs Created 10/2018	Obs Created 10/2019	Obs Created 10/2020
Mammals	85.5	89.0	87.7	81.3
Birds	94.6	96.0	94.5	91.5
Reptiles	91.1	92.3	91.4	86.5
Amphibians	82.2	86.2	82.7	74.7
Ray-Finned Fish	79.0	79.8	78.5	67.9
Mollusks	57.7	56.8	56.0	51.5
Insects	54.2	59.3	55.6	47.9
Arachnids	33.8	39.5	35.6	30.2
Other Animals	44.3	45.1	40.3	37.6
Plants	55.9	59.3	55.4	43.1
Fungi	27.6	28.1	28.3	21.3
Chromista	50.3	41.6	43.3	50.6
Protozoa	26.3	38.2	39.1	17.2
Unknown	1.2	15.2	3.2	0.5
All	60.0	62.7	59.0	49.2

cthawley · November 16, 2020, 12:26am

Yes, I agree that there’s definitely an issue with knowing what the “final” ID of any given observation that isn’t at RG yet.

I guess one way you could do this would be to take a set of observations that have been on iNat for some arbitrarily long length of time which would allow most observations that are able to be IDed to be IDed. Let’s say, 3 years or so. At this point, you could say with reasonable certainty that most observations that haven’t been IDed won’t be (though of course there would be some exceptions, especially if there are observations with lots of conflicting IDs or something). You could then exclude those “unIDable” observations from your dataset and retain those that have been IDed.

For each taxon you could then make a survival/rarefaction curve or something like that showing the proportion of IDable observations that have been IDed at any point in time after they were posted.

I think the time series that you posted gets at this idea pretty well…thanks! You can definitely see that for most taxa there are small gains after a year…generally only 1-5 more % getting IDed. There also seems to be a relationship between the IDableness and the speed of IDing. So observation in taxa in which a larger proportion of observations are ultimately IDed also get IDed more quickly. Interestingly, that pattern doesn’t really hold for the least IDed groups (like Fungi, Chromista, Protozoa) which still do top out at about a year.

Thanks!

jasonhernandez74 · November 18, 2020, 5:12am

Which averages out to:
Animals: 354,300 RG of 509,026 Verifiable (69.6%)
Plants: 256,505 RG of 462,850 Verifiable (55.4%)
Fungi: 33,782 RG of 119,517 Verifiable (28.3%)

Sorry, but it bothers me that people conceptualize plants as the whole kingdom, but animals as separate phyla or even classes. Kingdom Plantae has far more diversity than Phylum Chordata.

system · January 17, 2021, 5:12am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Question about response time for IDs General	45	4534	April 27, 2020
Patience is a virtue General	45	1151	April 4, 2021
Do older unidentified observations get buried and less likely to get attention? General	45	910	February 24, 2025
Differences in id quality and speed General question	16	921	December 13, 2023
Your observation that took the longest time to go Research Grade? General question , fun	17	922	July 17, 2022

Taxa Differences in Time to ID?

Related topics