The role of CV in the new app

whitneybrook · September 14, 2024, 11:18pm

I found what I think is a good example of a problem with the interface over promoting selecting CV options.

In this case, CV clearly doesn’t know what the organism is, but it took me a while to figure out how to make my own selection for the ID, so I could choose an ID coarser than species, but narrower than an iconic taxa.

I think for situations like this, the option to enter the name of a taxon and search for it should be more prominent. I can see how this increases the chance of placeholders still being entered (unless iNat doesn’t accept entries that don’t match with the database). However having people just pick the first species in the list when it is very low confidence doesn’t seem great either.

Another option would be for the software to realize that it doesn’t have a high confidence recommendation and encourage the selection of an iconic taxa by having those buttons at this point.

For the observations I’ve tested, when CV has high confidence, I think it is generally reliable. Where CV is overconfident or makes incorrect suggestions, I feel like the solution is to improve CV through continued technology improvements and continued effort of identifiers (much easier said than done). But increased nudges users toward the CV selection will magnify any errors. New runaway CV problems could emerge from people starting to accept a CV recommendation in the above situation.

DianaStuder · September 15, 2024, 7:58am

I can see how this increases the chance of placeholders still being entered (unless iNat doesn’t accept entries that don’t match with the database).

natemarchessault · September 15, 2024, 11:03am

I can’t use the app since I’m on Android, but would just echo the concerns here about functionality in areas with no cell signal. I’m in southwestern New Hampshire and most places I go outside do not have cell signal, and even in the city of Keene signal is spotty. Just using this as an example since this is a reasonably populated area less than two hours from a big city.

With adding IDs it sounds like one still can’t just add a species manually with no cell signal, is that correct? If so, why? I had always assumed that it was because those data were not stored in the app, but if it is suggesting species without cell signal the information must be stored on the phone, right? The suggestions are awesome but full functionality would be fantastic.

You probably know this, but you could always turn off automatic upload then add IDs in the app before syncing. Not as efficient as the placeholder method, but would save you from this situation.

bouteloua · September 15, 2024, 11:36am

Adding IDs in the app afterwards would take me about 10 times the amount of time so yes I do know about the option and choose not to. I always have automatic upload turned off.

charlie · September 15, 2024, 1:43pm

yeah hard agree and agree that it’s worse with the new app. i did make a seperate category for issues with poor cell connectivity, before i saw these additional posts here, so sorry to be redundant but i think it may deserve its own post. Some of these are broader issues with the iNat workflow but it seems the new app will make them worse.

I like the idea of offering the iconic taxa on the top when there’s poor CV confidence. However i/d personally be unlikely to use them unless there was also some other way to flag them to take another look at them, because i feel like i might forget what i was looking at or forget to do so otherwise

pisum · September 15, 2024, 2:12pm

@tiwane – is this because Next is using a simplified local model for suggestions (sort of how Seek works)? if Next is using a simplified model, is it for all suggestions anywhere in the app, or only for the AR camera and when offline?

zoology123 · September 15, 2024, 5:37pm

The worst part is that the CV unlearns non leaf taxa. Take Chironomus, with 500 species, most looking very similar. But because a few species are distinct while the majority of species aren’t. If the CV learns the distinct ones, it will unlearn how to ID the other species to genus. Thus they will likely never be able to have an accurate ID suggested by the CV.

What happens to taxa that can never be learned by the CV because the CV unlearned how to ID subgenus, genus, etc? The CV does what it does and suggests similar looking species that it knows. Sometimes it is a Chironomus species, many times not though. You will see most of the unidable Chironomus species get IDed as Axarus, Glyptotendipes, Goeldichironomus, or not really getting a suggestion at all.

The big issue here is that it’s impossible to teach the CV how to ID many of these species and the option to teach the CV to suggest genus is not there completely either. There is a way to get to suggest genus, but in a round about way. This new app removes even that ability.

saylorj · September 15, 2024, 5:50pm

I’m dreaming here, but in this case I would like to see a little pop up diagram of the taxonomic tree for these disparate CV suggestions, so that if I didn’t know any better or was in the field and couldn’t research it further I could at least click the top node as a starting ID.

zoology123 · September 15, 2024, 6:00pm

That’s why I call this a battle. The CV overall for the whole site is honestly good. It is an amazing peace of technology that helps 10s of 1000s of people. But it can also sometimes create what I can only describe as battles with IDers. Take Diamesa, or Tanytarsini. With these taxa, the CV has gone from being a helpful tool to creating an endless battle. When the accuracy rate is as low as 1 correct observation IDed out of every 20-25. That’s not helpful, it’s a practically endless amount of work that needs to be corrected by IDers. It’s a battle that never stops without massive work to retrain the model. But this is a slightly different topic to this discussion related through the CV.

saylorj · September 15, 2024, 6:18pm

May I ask a few dumb questions, because I have been thinking about these issue, but don’t understand:

Does the CV rely on Research Grade observations to learn?

If so, could there a way to override that and curate the observation set the CV learns from if it’s going off track with particular species?

Or could there be a checkbox that popped up for species that need microscopy or dissection or DNA etc to ID to species level? Then an observer has the option to pick the CV suggested species but only if they check that box (yes to dissection evidence) or (no then choose genus)

zoology123 · September 15, 2024, 6:18pm

What I can undoubtedly say is that when the CV recommends the top taxa suggestion, as it is now on the page of an observation, uploading through android, or the website. The CV is actually usually right. Even with taxa it doesn’t exactly know, it can usually suggest Chironomidae, Tanypodinae, Polypedilum Group, Chironomus Group, etc. It’s amazing. But if you strip that away leaving only the leaf taxa that the CV is actually trained on. I would say easily that the accuracy for Chironomids will drop by more than half if not more.

To all the other IDers, think about how many times the CV is correct in the top taxa ID, but is wrong with the species suggestions. This will destroy the CV suggestion ability in other parts of the world for a number of taxa. For example India. There are no Indian Chironomids in the CV, but by using similarity to species in the CV from other parts of the world. It can actually ID family, tribes, or even sometimes genus groups.

What happens when the CV only suggests species / leaf taxa learned?
For Chironomids.
South America 1-3 CV taxa known
Europe 4-7
North America 30-35
Oceania 1
Asia 1-3
Africa 1-2

Why should other parts of the world where their fauna are either not IDable, researched well or just haven’t been IDed yet suffer because the CV interface being coded in a certain way?

Ideally a good solution to all of this would be to include higher taxa in the CV to begin with. Meaning the few US Chironomus species idable can be suggested in the US, while the genus can be suggested in Africa, Asia, everywhere else.

zoology123 · September 15, 2024, 6:37pm

The CV requires RG observations to train species level taxa. For anything else, tribes, genera, subfamilies, subtribes, etc. It requires that the “Community Taxon is precise”.

This means an observation where the uploader IDs it as Chironomidae, will not provide training data to the CV. But if I or anybody else IDs it as Tanytarsini. It will be included in the training model for Tanytarsini. This is one big reason why one should not ID other peoples observations without adequate knowledge on how to do so.

The only current option to fix erroneous learned taxa in the CV is by IDing them yourself, and going back through all the observations of the taxa looking for where people agreed to an ID of the taxa and either confirming or disagreeing.

Having extra features to control what taxa can be learned by the CV would be wonderful. Like permanently removing species that require DNA or total dissection to ID from being suggested by the CV. Currently If a species that requires DNA to ID gets enough observations confirmed from DNA uploaded. The CV will indeed learn how to ID the species just from images. This is of course false. It’s just training on the images uploaded. But nothing can stop it besides either not IDing observations, or not uploading observations of that taxa.

Other info.
The CV only uses the first 5 images of an observation for training.

It can use casual observations for training non species taxa if “Community Taxon is precise”.

A minimum of 60 observations and 100 photos are needed to be included in training.

The CV only learns leaf taxa. If the CV learned how to ID a genus, IDing a species in that genus enough to get it in the CV will make it unlearn the genus. This is one way IDing (training the CV) can actually make the CV more inaccurate.

saylorj · September 15, 2024, 6:39pm

@wildwestnature already suggested something like this, so I’m just upvoting this. But also maybe there could be just simple button at the top for the common taxon of all the suggested IDs.

saylorj · September 15, 2024, 6:46pm

Does that mean that I should not include a photo of a spore print or other microscopy in those five images?

zoology123 · September 15, 2024, 6:53pm

It is not really my place to say what you should or should not upload to the site, unless it breaks guidelines. What I will say though is that any taxa with less than 1000 images in total will use all the photos that qualify for training. Images absolutely can have an effect on the CV, but that’s very hard to quantify the impact without actual tests of some kind.

fluffyinca · September 15, 2024, 6:53pm

I don’t think it’s true that it only uses the first 5 photos of an observation. To the best of my knowledge, all photos from eligible observations go into some kind of pool, where they are then randomly selected to train the CV. iNat staff could tell you for sure, though.

saylorj · September 15, 2024, 6:54pm

So if species that are visually indistinguishable were grouped into a multi-species group that represents the limit of what the CV can learn, and that could manually be set as a leaf in the CV learning set ( maybe even by region) would that help?

zoology123 · September 15, 2024, 6:56pm

Mmm, perhaps your right re-looking at this " There must be a least 100 photos of the species and 60 observations of the species, and we don’t choose more than 5 photos from an observation to train the model." Though it’s not exactly explicit in what it means besides no more than 5 are chosen.

https://help.inaturalist.org/en/support/solutions/articles/151000170368-which-taxa-are-included-in-the-computer-vision-suggestions-

zoology123 · September 15, 2024, 7:00pm

Only in some cases, like the ones where the CV really shouldn’t learn any of the children taxa. I believe allowing higher taxa to be learned and what you said are both solutions to different CV issues.

klodonnell · September 22, 2024, 9:33pm

I really like this suggestion. As a few have said here, it took me a little while to figure out how to add my own ID. At first I thought I was being forced to use the CV suggestions. I think having the “add your own ID” field or something similar always visible at the top would make the different routes you could take to adding IDs to your own observations far more clear.