Are genus-level RG observations used for CV training?

Except that for some taxa, the CV is not helping iNat or observers, because the way it is trained means that it regularly makes wrong suggestions. Not just the occasional wildly incorrect suggestion, or wrong species-level suggestions with a correct higher-level first choice, but systematically suggesting a wrong taxon.

If most observers “provided the best IDs they could” and did not rely on the CV, or were able to recognize when the CV is wrong, this would not matter. But many observers do rely on the CV – because they believe it has to be right (providing IDs is what is is there for, why would they know more than a specially trained program), or because they do not have the knowledge to assess the suggestions or think that they look plausible. Whereas without the CV, observers would be making their own guesses about the ID. Sometimes this would be just as wrong as the CV, though likely in other ways; a lot of the time it would probably be a broad ID that is generally correct.

I don’t expect the CV to replace human reviewers; I would be satisfied if it supported the efforts of human reviewers rather than working against them.

To be clear, I don’t think that not suggesting species IDs is a good response to the problem. I think the only chance for palpable improvements for difficult taxa would be changes to the way the algorithm is trained.

However, it would probably make more sense to brainstorm possible implementations in this thread: https://forum.inaturalist.org/t/recommendations-on-improving-the-ai-algorithm/63027/

6 Likes

the CV is not irrelevant to human reviewing when it actively makes more work and worse starting conditions for the human reviewers.

8 Likes

Going forward, at least some of the Geomodel Anomalies should be prevented.
A popup that warns - the sp you have chosen is from CA, are you sure, that is what you saw there?

https://www.inaturalist.org/blog/99727-using-the-geomodel-to-highlight-unusual-observations

https://www.inaturalist.org/blog/84677-introducing-the-inaturalist-geomodel September 2023

5 Likes

apart from the monthly updates, there have been bigger updates too.

1 Like

I agree that the thread linked there is a good one. The original topic here is focused on a specific aspect of the CV (whether it is trained on genus-level and really, whether it will suggest genus level IDs). I think the discussion of that specific aspect of the CV (and potential solutions) here is constructive/productive. But any other suggestions related to improving the CV that aren’t focused on the main topic here are probably a better fit for

I’ve made a comment in that thread linking to this one so that readers can more easily find this thread about the specific topic at hand.

1 Like

First part yeah, but the second part does not make sense. We haven’t talked about replacing human reviewers.

What you are essentially saying is we shouldn’t prioritize helping make identifiers job of identifying easier. The more accurate the CV is initially, the less misidentifications, the more observations that are correct you can just click agree on rather than typing full names, and if something is correct at tribe, you can just leave it if it’s not IDable further. This means giving more time to identifers to identify rather than just correcting incorrect observations which can take large amounts of time.

To put in another way relating to my own experiences. I am the only main identifier currently on the site for Chironomids, that’s 175k observations i manage. The number of misidentified Chironomidae uploaded each day are mandatory observations I have to ID. Even if its to family, or subfamily, where i would prefer not IDing it as there are other observations of species much more IDable. I can’t ID every observation by myself. The more time i spend correcting, the less i do actual IDing.

Really what i think i’m more describing here is the issue of identifiers not being able to keep up with observations as the site grows. How many other taxon are similar to Chironomids? How many have 0 large active IDers so misidentifications dont get corrected?

But this doesn’t mean the CV is the most important thing. It certainly isnt. But it really should still have a large priority as it is the overwhelming root of misidentifications on the site and even has the danger of creating misidentification loops that can spiral out of control.

5 Likes

While I agree that wrong CV IDs are a huge issue for IDers, this part is not really a CV problem. Reliance on a single active IDer or even a few IDers to handle an ever-growing volume of observations is not sustainable, either for you or for iNat.

Have you tried recruiting other users who might be interested in learning to ID chironomids? I know this is not easy, but it seems like some of the fly people have been successful at building up IDer networks through a projects, journal posts, and mentoring.

2 Likes

This has come up before.
If you can write a journal post to link to.
And bring the URL to the IdentiFriday thread.
Or send a PM to an active identifier to ask for help (I am currently working thru proteas, and since yesterday Moraea tripetala) What I can clear, takes the load off the taxon specialists, so they can focus on the interesting obs, instead of ploughing thru the grunt work. It benefits both sides if you mentor your support team.

25% of IDs are made by 130 users
That is demoralising for active identifiers.

2 Likes

Two different issues can be closely linked and effect one another.

Chironomids are difficult enough i dont even reccomend trying to learn for most people. Its taken 100s of hours if not already 1000 hours and 100s of scientific reports and books to learn them to the degree I do.

If somebody has the passion and time, i would actually suggest trying to learn a group inaturalist doesnt have an IDer for. Like Biting midges, or any other of the number of iderless taxa.

@DianaStuder

I never knew about this. I’ve already plowed through most important grunt work. Like the 3k incorrect Diamesinae identifications. This is interesting though and i think could definitely be useful for some problem taxa.

1 Like

I do wonder how much the situation described in the linked paper has changed since. I’ve become vastly more active in identifying observations since their study period, and though I’m by no means one of the top 1000 identifiers (not that I know how to check the 1000th-to-500th band), I’ve at least pulled a few identifiers along with me.
the fact is, the elimination of genus-level observations from the CV whenever a daughter species enters it is terrible news for “mixed situations” where you might have an undescribed species that’s known to be in a particular genus, but where a genus-level suggestion will never be made by the CV (at least outside of the old app as a “higher segment” of a species suggestion) because there is a lower species it’s been trained to suggest. for example, it makes for far more inspection time and disagreeing IDs for those of us identifiers going through such groups…

1 Like

I would love to see updated figures.
Are we still at 130?

1 Like

It looks like both of us filed feature requests to address this problem:
https://forum.inaturalist.org/t/allow-some-non-leaf-taxa-to-be-added-to-the-cv-model/63937
https://forum.inaturalist.org/t/allow-for-genus-level-cv-training-sets-irrespective-of-species-level-participation/63938
Maybe if people vote for both, it will be twice as likely to be implemented :)

1 Like

Ha! Great minds think alike. Maybe a moderator can somehow combine the two feature requests?

2 Likes

You don’t need someone who is willing to spend hundreds or thousands of hours learning all chironomids worldwide.

To start with you merely need people who are interested in spending a few dozen hours learning to ID some of the more distinctive chironomids in their region. If there are other people you can rely on to look at some subset of the total observations, this frees up capacity for you to look at the rest. Even if you still feel you need to review everything, having someone else working on the taxon means that there are others who can share the burden of correcting wrong IDs.

I have a reasonable idea of the difficulty of identifying chironomids from photos and also the amount of effort that it probably required to develop ways to apply the information in keys (based on microscopic identification of a specimen) to photo identification. I am not trying to diminish your accomplishments or question your dedication.

But IDer burnout is real. I’ve seen it happen to others; I’ve been close to it myself more than once.

And it is not solely the result of bad CV suggestions. They certainly do not help, and I absolutely am not arguing that the algorithm doesn’t need to be changed (it does), but improving the CV will not solve the problem of there being too few skilled IDers for a difficult taxon, or the stress that goes with this. It will also not prevent wrong IDs, though the mistakes will likely be somewhat different and – perhaps – somewhat fewer.

(In my review of European Xylocopas – where all 3 lookalike species are included in the CV model and it correctly recognizes the genus 99% of the time – I have found that observers often choose wrong species-level IDs instead of the genus. They may do so even when the top species-level suggestion is the correct one. Not infrequently, they choose wrong IDs in spite of the fact that there is readily available information on popular websites that would have allowed them to ID it correctly. For example, they choose species A when the photos clearly show a well-known and fairly conspicuous feature that is unique to males of species B, or they choose a species that has never been documented for their region. There is an important difference in that I can educate observers who may make more informed IDs in the future, whereas I cannot directly influence the CV the same way. But it will probably never not require the active involvement of skilled IDers to correct misidentifications: some taxa simply aren’t intuitive to most people.)

It is not sustainable or desirable to have all of iNat’s collective knowledge about a taxon concentrated in a single IDer, or even 2 or 3 IDers. Because inevitably all of us will, sooner or later, cease to be active on iNat, whether due to lack of time, changing interests, unresolved frustrations, illness, or death. And I think most of us would like to feel like our efforts have not been in vain – that a taxon we have put considerable effort into trying to clean up will not descend back into chaos once we are no longer there to work on it. The only way to make sure this does not happen is if there are others who share our knowledge, whether they acquire it independently or with the help of guidance that we provide.

4 Likes

What you say resonates with me. You are correct. Perhaps i really should spend time to develop ways to teach others what i have learned. So that the community can better understand these organisms. Maybe a guide, journal posts, resource consolidation where people can learn where to look for materials. Maybe when i have more time over the summer i can start work on that.

Thank you for your comment. Maybe i could even ask for help from somebody good at ‘translating’ information in a way others can understand and learn from it. I feel i sometimes struggle to actually explain certain details. But this is a side tangent of this topic. Thank you again for your comment.

3 Likes

Remember the Pareto principle. One person observes. Another 9 engage - ID, annotate, comment, Like. But a silent 90 are reading along.
So a brief comment

can be a ripple in the pond and pay dividends going forward. Text expander is useful. And that becomes a way to find the people who are interested, and ask you questions in turn.

1 Like

What makes you think I don’t leave a comment?

My point was that even though there is already fairly widespread knowledge – on iNat and on the internet more generally – about how to ID X. violacea males, this does not guarantee that people will choose the right ID. Because there will always be new users who aren’t familiar with this. The process of educating observers will never end. It would be nice not to additionally have to fight wildly wrong CV suggestions while doing this, but IDers of taxa that are non-intuitive to most laypeople will probably always have to actively check and correct IDs to an extent that is not necessary for “easier” taxa.

2 Likes

@cthawley Chris, this discussion has strayed into unintended territory and seems unlikely to return to the original topic in the first post. Please close this thread.

Topic closed at request of OP