Recommendations on improving the AI algorithm?

With more details on this specific case. I can likely reccomend actions you can take to potentially help the situation. I know nothing about these fungi.
But info like
How many taxa are involved currently with this issue?
How many taxa look very similar/identical worldwide?
How identifiable are these species?
What are the ranges of the related taxa currently in the model and how well is the geomodel lining up?
How many observations are involved?
How many identifiers?
How misidentified are the related taxa in the model?
Is there potential for this to runaway into an uncontrollable amount of misidentifications for the current iders of this taxon?
Etc.

Ultimately the goal is to create a plan to deal with (idealy) the root of the issue if possible. Many times the main root issue is the CV does not know similar looking species exist. But if not possible to address the root issue, then you have to try other things to try and help improve the situation.

1 Like

If anything, i believe this post should be primarily organized for actions individuals can take to help improve the CV. Including all sorts of tips, tricks and the strategy involved. Rather than feature requests. But honestly, theres a point where its a loosing battle becuase many of these things are very much linked either directly or indirectly. Disagreement, drama thing was moved. Removed short comment myself.

@someplant The case you bring up Closterium is a more difficult one with the current system configuration in place. Assuming it doesnt change. The best thing to do is to get any other species of the genus or similar genera in the CV. In a way purposely confuse the CV by teaching it there are many other similar species.

I will preface, ideally the CV wouldn’t be allowed to reccomend such difficult to ID species. Since it is far more likely to just incorrectly suggest species a vast majority of the time. But we are only users and dont have the power to change how it works.

Now getting other species eligable is often easier said then done. But there are many things one can do to try and get a taxon eligible.

One is going through the entire genus worldwide trying to ID a specific species, or couple of specific species. Strategically focusing on them. If going through all those obs does not provide enough, you can look for some stuck at higher taxa levels, stuck becuase of disagreements, ask certain users to try and observe more to provide the necessary training data. Etc.

But needless to say if these are anything like Orthoclad chironomids. The current CV setup is very much not at all adept to dealing with them. Dealing with Orthoclads has been incredibly difficult and i am still trying to work on it. The best thing to do actually in my position is to ask those who can observe with enough detail, or find the rare idable ones to observe more of them.

1 Like

Yeah this is a really common issue that causes a lot of the frustration with the CV from the identifier side, with any taxa with obscure difficult-to-ID species. Some comments on it here: Computer Vision should take into account fraction identified to species

2 Likes

I don’t know if the CV has better clues and considers the inaccuracy rate of its suggestion in future. I think that it should be weighted too, instead of just training for the same loss on a new set (or entire set); it would be better to value as weighted loss as this is something the previous data failed to consider; maybe the new observation set identified something new in a rare species or such of the cases discussed above where there is not enough diversity. It would be a tight balance between code complexity and trusting new data being ID’ed without hampering the accuracy of previous model suggestions. I don’t know if the code for all this is on github!? Can someone confirm and direct me? Thanks.

another metric is: the misidentification rate for that species with others (we see on taxa page), i think that should be considered too when giving final decision.

I see a mismatch between the Geo-model and CV model often. I saw CV showing completely different continent species confidently and new users directly agreeing to it. Ofc I am not saying we should ignore cases of migrations or such in CV, but when there is never an observation of a species in that area or country, there could be a marker that can be shown below the CV suggestion (for example, it shows expected nearby in mobile apps as of now, maybe it could be added there “not seen before in xyz” too, where xyz can be area or country or continent)

Especially with the lack of explainable decisions in current CV ( I really hope we can get some SHAP or such of CV decision for power user), there should at least be a transparent confidence indicator of its suggestion even if they are sorted on such confidence (maybe enabled via profile setting to overwhelm always?), on mobiles I see green “Visually similar/expected nearby” for probably better suggestions. Still, there is no such clue on the desktop; maybe a class of colour intensity indicator to distinguish titles of species in CV suggestion can be better on the desktop.

Also on desktop, the AI suggestions are triggered only for first image (on mobile one can slide to new image and the new CV suggestions appear for that image), sometimes the first image can be zoomed out shot or bad and ability to retrigger on desktop would be very helpful too.

When CV is making a suggestion, maybe we can include the species that are not in the CV right below the suggestions like the above example of algae; I feel it is the responsibility of CV to do it to reduce mis-IDs, which will only flywheel if not controlled (where the new CV version learning from such mis-IDs) now.
For example, this woodpecker has very few observations and is not included in CV, https://www.inaturalist.org/taxa/17964-Dendrocopos-assimilis. Still, for that range of woodpeckers, it would be really helpful if this suggestion popped below CV suggestions (something like species not added but probable woodpeckers) just to make users consider them too when using the CV tool, ofc it is totally hard when there is no ID or completely different things can look similar (but again maybe restriction in suggestions case as next point), but at least if it is a Red bug, we can show an indicator showing there are lot of redbugs possible in that area and not in CV, similarly as with above woodpecker when CV has already zoned confidently on woodpeckers suggestion, this non-included taxon is better shown right alongside CV recs.

Finally, for a new user (their global count or for that taxa?), it would be better to show higher-level restricted suggestions by default (maybe they can expand with another press or option), more so when the CV is not confident or if that species has a higher chance of past mis-IDs.

1 Like

We have an app made by an iNatter for this
https://forum.inaturalist.org/t/inaturalist-enhancement-suite-chrome-extension-v0-7-0-identifier-stats/44002

If you have the taxon knowledge, and you are on the distribution map and can see outliers? I follow them up (I have learnt to allow for - sigh - our wildflower is your invasive alien and vice versa). But if iNat tells us - this is seen nearby - it may - be a mistake to follow up.

You can search for Geomodel Anomalies about 1 million obs … I clear the Cape Peninsula ones daily.

Search link came from
https://www.inaturalist.org/blog/99727-using-the-geomodel-to-highlight-unusual-observations

That closed. I agree with it, but i dont know how best that could be implemented.

What i do know whether that gets implemented or not. It should be of some importance for most identifiers to spend some time on improving the CV explicitly. Many taxa could reach eligibility if they were just strategically focused on. For example, im confident a couple of North American Photinus (firefly) taxa could get in if they were just focued on. The common firefly is terrible and over suggested for so many fireflies. But a few other species have like 60 obs. Just a few more, and the CV knows another similar critter exists, maybe common firefly isn’t everything.

In this case, it should reduce the amount common firefly is top suggestion, thus reducing some misidentifications. It also shows uploaders other species exists without them having to look it up themselves.

2 Likes

Oh, thanks; I was going to search on forums for this later. I do clear up anomalies randomly, as I see from the suggestions page. This would be helpful and I only noticed those filter options from your link.

Yes, but I still think the CV model should be responsible for it too, especially considering misIDs from new users with this. I read that CV considers Geo suggestions first and if it fails it still goes on to recommend visually similar ones, although we notice this indication of failure in “Not confident” indicator within suggestion popup, but when people check visual similar suggestions tab for observation during compare, there is no such indicator there.

Ooh I will check. this would be helpful for now.

1 Like

That comes back to - do newbies care (yet)?
And onboarding. Please check distribution - if that is the first obs in Africa - it less wow, than flat out wrong. I leave a comment

You have chosen a sp from …

Definitely worth following up as many as possible of the Geomodel Anomalies. It only takes one obs. With one ID. For CV to say happily Seen Nearby. So that wrong one needs to be extinguished before it becomes a wildfire.

1 Like

I work in IT.
Sometimes the user recommended solutions are not feasible and we find more effective solutions to issues and requests. The same might apply here, the iNat team might find an effective way to produce a CV recommendation that would not cause identification problems if accepted. I am not fussed how is it done, is it any of the recommendations on this thread, an existing request or something completely different.

We all know and accept that CV needs a minimum number of photos to learn a taxa.
There are probably conflicting priorities and other constraints as well.

These are the issues

  1. When a genus has only one or two taxa with enough observations for CV to learn, CV will recommend these taxa only. CV should revert to genus unless it can be verified that it would not have misidentified observations of other taxa in the same genus, above an acceptable percentage level. The current workaround is for identifiers to correct the IDs and add enough observations for the other taxa in the genus. This is not always possible.
  2. CV recommends taxa from a different continent. This incorrect ID in turn skews the geomodel. Do not recommend taxa without location, even better, do not recommend anything without a location.
  3. Hybrids are excluded and genera are sometimes excluded.

(Edited for typos)

2 Likes

A good, related thread about the specific issue of how the CV model is currently trained and it ability/inability to provide genus (or other non-leaf taxon) level suggestions under certain conditions is here: https://forum.inaturalist.org/t/are-genus-level-rg-observations-used-for-cv-training/63859

2 Likes

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.