An Idea To Promote Explosive Growth for INaturalist and its Data

gerrit_oehm · December 29, 2023, 1:54pm

I did something similar for hoverflies (with a focus on Germany/Europe for now): https://www.inaturalist.org/journal/gerrit_oehm/83256-syrphidae-identification-resources

gerrit_oehm · December 29, 2023, 1:57pm

It would be great if one could “save” journal entries of other persons somewhere on Inaturalist, to have a quick access to the growing number of resources that other users share; That might also help more people find gems like this one on Drosophila: https://www.inaturalist.org/journal/carnifex/43576

That_Bug_Guy · December 29, 2023, 2:03pm

That would be nice. For now, I just keep bookmarks of them.

petezani · December 29, 2023, 6:46pm

I’ve done the same for my own notes on the nearly 300 species of lizards from the Amazon. I had to organize taxonomically for my own sanity, but I use my own notes extensively.

whaichi · December 30, 2023, 4:35am

I was adding links to identification resources (for Korean fauna) to my iNat profile but eventually moved them to a Word document when the list started feeling large. I had the thought though that it might be helpful to make an iNat journal entry dedicated just to those types of resources. Maybe something to consider?

DianaStuder · December 30, 2023, 9:00am

Journal entries are good - because they can be linked to, found and used by others.

gerrit_oehm · December 31, 2023, 1:50pm

It is definitely a team effort - and in general I have to say that it helps when people try to make as many identifications as possible - I often go through piles on a higher taxon level, just to get them closer to the respective identifiers of the group… that is pretty fast work… I love all the suggestions on how to identify more observations for others here, and have a look from time to time: https://forum.inaturalist.org/t/identifiers-bingo-new-card-now-available/27528 … a favorite of mine is to go through the oldest unidentified records of a certain group first - because there is often a mixture of fun little puzzles (tricky species, some that were pre-identified as the wrong taxon, disagreements and just bad photo quality…). More or less regularly, I go through a particular species or genus of one of the insect groups I work with - but depending on the group, this can be very fast or very slow. Both of the latter options to go through species that need an ID can be quite slow, but in my opinion are just as important as reducing the overall pile of unidentified observations. In the last few days I have been working on getting my identification stats a bit higher (https://jumear.github.io/stirfry/iNat_identifier_stats.html), but now I am again back to thinking that in the end all stats and numbers are relative, depending on what your focus is… (Nevertheless, I will keep my identifications up, don’t worry - as along as my time allows… - Part of my self-employment is also the identification of (mainly collected) insects, and I volunteer to identify species for another (regional) species observation website as an expert…).

DianaStuder · January 5, 2024, 9:54am

(But botanists on iNat focus on wild species, not horticultural cultivars)

An example from the Sahara

gijsroaming · January 16, 2024, 7:59am

Well if all “own IDs” were correct you would indeed need only 40 million to get all to RG, so the real number is probably somewhere inbetween.

arboretum_amy · January 16, 2024, 2:22pm

It’s a complex question; I don’t know if staff (or anyone else) could answer. They may be able to tell us the average number of existing IDs per RG observation, but I don’t know if they could avoid including “agrees” added after the observation was already research grade.

DianaStuder · January 16, 2024, 6:09pm

Perhaps it could be a dedicated iNat blog post?
Someone (who is allowed) needs to access the data, then crunch the numbers for us.

@loarie states this year’s 56 million IDs is “more than enough” for this year’s 40 million observations

lynnharper · January 16, 2024, 7:47pm

That sounds like it’s a job for the Mighty API Wrangler, @pisum.

pisum · January 17, 2024, 12:04am

“identifications needed” is hard to define precisely. theoretically, if folks were to apply the “as good as it can be” option very liberally, then you could separate all non-spam/flagged observations into either research grade or casual. in that case, the “needed” would probably be IDs by others to achieve RG (not including subsequent IDs, unless a taxon correction occurs after RG) + IDs to set to casual (although setting to casual doesn’t always coincide with an ID in the system).

but then in practice, even if you set aside the fact that it takes some unknown amount of time for needs ID observation to become RG, we as a community don’t tend to apply the “as good as it can be” flag very liberally. so then we have a bunch of “needs ID” observations that theoretically will never get an ID and in practice don’t actually need an ID. but the size of this set of observations is unknown.

and then there’s the impact of taxon changes that come along and create change…

probably the easiest way to estimate whether we have “enough” identifications is to compare the RG to verifiable ratio at the end of the year to the ratio at the beginning of the year, and if the RG to V ratio goes down, then we don’t have “enough”.

but if we have the same RG to V ratio at the end of the year vs the beginning of the year, then i think it’s reasonable to say we have “enough”, and then i think we could have a reasonable estimate for “needed identifications” (per observation) by taking the number of identifications by others made that year and divide that by all the observations submitted (including casuals).

i don’t know if anyone made a particular effort to capture RG to V ratio at the beginning of the year and again at the end of the year. i have some snapshots of this at random times in the forum though. i have enough snapshots out there that maybe you could see a trend in the numbers. last i checked, the number holding steady. so if you assume that it continues to hold steady, then you could probably just guess that, for a given period, “needed identifications” (per observation) = identifications by others / all observations submitted.

right now:

identifications by others in 2023 = 60,726,399
observations submitted in 2023 = 48,616,471

so then that’s a ratio of 1.24.

lynnharper · January 17, 2024, 1:03am

I can see how the change in RG/V ratio could be useful, but is that measuring whether the identifiers are keeping up with the observers over the years, or is it getting at the question of how many IDs are needed, on average, to turn a Verifiable observation to Research Grade? (Or to Casual, for that matter, although you might need to measure annotations for that.)

Or is my brain simply fried from too many IDs today to understand your logic?

pisum · January 17, 2024, 1:40am

it’s a simplified way to estimate how many IDs by others are needed to both determine what is actually verifiable and to get to RG. it includes additional IDs beyond RG and beyond Casual, but in practice, it shouldn’t matter that much.

lynnharper · January 17, 2024, 12:24pm

Yes, I think you’re right. Let’s try to remember to revisit this in January of 2025 and see how the ratio changes. Thank you!

Edit: We can go back further by substituting 2022 for 2023 in your API request, yes? If that’s correct, the ratio for 2022 was 1.23. That’s pretty consistent with the ratio for 2023.

Edited again because I’m fascinated: The ratio of IDs to observations submitted in New England for 2023 is 1.11. And I would have thought that New England would have sufficient identifiers to keep up with the flood of observations!

pisum · January 17, 2024, 2:27pm

i guess that’s fine just as a rough number, but really the snapshots would be better for this, i think

just looking at the map of RG to V, it looks like the Boston area is low, and when i do the calculation of IDs by others in 2023 to observations submitted in 2023, i get 1.07. there are other areas that look low, but Boston is probably the most significant area in terms of number of observations.

lynnharper · January 17, 2024, 2:45pm

By snapshots, do you mean what iNat provides at the end of the year?

The Boston area has many iNat observers and a very strong City Nature Challenge, with lots of enthusiastic but here-today-gone-tomorrow students. So perhaps there are more observations than one might expect given the overall population?

pisum · January 17, 2024, 3:06pm

the snapshots provided in the year in review (ex. https://www.inaturalist.org/stats/2023/) seem to include only observations created during that year.

i’m talking more about a snapshot at a point in time that includes all observations at that time. here’s a post where i compared a couple of snapshots i captured at different points in time: https://forum.inaturalist.org/t/needs-id-pile-and-identifications/26904/4.

i’m sure there’s more to it, but this could explain part of the issue. if you zoom in on the RG to V map, the Boston core (on the right in the screenshot below) is very red, indicating a high RG to V ratio, but some of the blue spots to the west (on the left of the screenshot) with low ratio seem to be college campuses:

lynnharper · January 17, 2024, 3:24pm

Ah, that kind of snapshot - thanks for clarifying that for me.

It’s a bit discouraging to me as an active identifier that the area where I do most of my IDing needs so many more IDs than I could ever provide.

Topic		Replies	Views
What iNaturalist is for General	50	5992	November 6, 2019
Recruiting more identifiers General	287	25013	December 21, 2019
Paper about inaturalist The benefits of contributing to the citizen science platform iNaturalist as an identifier General	68	3481	March 9, 2023
Little interaction by the community General question	58	4268	January 17, 2022
Why not empower recognised experts? General	133	9056	October 2, 2020

An Idea To Promote Explosive Growth for INaturalist and its Data

Related topics