How influential are incorrect Research Grade observations for CV learning?

In a few species, e.g. Usnea longissima, the bad ID’s so overwhelmingly outnumber the good ones that the computer can’t learn the correct ID. Low numbers of errors probably aren’t a serious problem.

5 Likes

There are a few taxa that used to get incorrect CV suggestions constantly (California bay, several of our local strawberry species, a few others) until I went through and corrected every single observation. Now there are almost no incorrect CV identifications coming through on those.

So I don’t know what the mathematical stats are, but I can confirm that it absolutely makes a noticable difference. I highly recommend people going through all the RG and casual observations of taxa they know well.

13 Likes

Just an anecdote. I originally started identifying pokeweeds because so so many Phytolacca acinosa in Europe were incorrectly identified (as Phytolacca americana, Phytolacca icosandra, Phytolacca octandra etc.). The latter two are bascially non-existant in Europe (I can understand P. icosandra as it might share the common name “Asian Pokeweed” in some languages. Nevermind that it’s native to the Americas.).

I still remember how infuriating it was that CV suggestions were anything but Phytolacca acinosa. Since then I think I have corrected every or most wrong IDs (at least I hope so). Nowadays the CV suggestions are correct more often than not. So yes, from my experience wrong IDs can have a negative impact on CV at least in taxons that are basically unsupervised by experienced users.

10 Likes

My anecdote is similar. A few years ago the computer vision would put Ageratina altissima on a wide variety of white flowers. I would disagree with them (well especially the ones which ended up in “Needs ID”) and it is much better now (due to more than just my efforts, I presume, although I suppose my IDs must have helped).

5 Likes

I used to go through those and put a lot of IDs on Ageratina altissima and now that they’ve split A. roanensis out as a separate species, there’s the next cleanup challenge… Most A. roanensis observations on iNat are probably currently RG A. altissima. I have yet to find a good key how to tell them apart on the kinds of photographs typically posted on iNat. If anyone has insights to share and wants to help sort through the RG A. altissima observations to find these before the next CV model is trained, that would be great.

1 Like

There have been “clean up” efforts for commonly misIDed taxa on the forum (see https://forum.inaturalist.org/t/computer-vision-clean-up-archive/7281 for instance) that have been quite successful in some cases in corrected common misIDs and improving future CV models.

3 Likes

Let me guess, its incorrect suggestion was Laurus nobilis? Because a lot of people refer to California bay as bay laurel and chose that suggestion?

Hopefully one of them is to stop suggesting Blow Wives for every Uropappus seed head.

That was one of them! It also tended to suggest willow and oleander for bay, because there were a ton of mixups with those as well.

1 Like

I’d be also interested what is the effect of observations with multiple very different organisms that get to the RG status according to the first photo. Do they spoil the learning for that taxon?

If you come across those, leave a comment and bump the ID back up to the taxon they have in common. RG makes for some very off taxon pictures - and I follow up and bump those back too.

More often, I encounter those relatively early, but they get later raised to RG by others. Sometimes with a comment to split the observation, but it is rarely followed.

How can they get to RG? Do you add a disagreeing id, leave a comment and mark it “as good as can be”? They should be casual.

1 Like

It happens when enough people ID the organism in the first picture, ignoring all the other pictures in the observation. I’ve seen it happen plenty of times myself and routinely find some when I look through RG observations for my area for errors. I usually add a comment to let everyone know that there are more pictures of a variety of organisms and often someone will change or withdraw their ID in response to that. Occasionally, though a lot less frequently, the observer will fix it.

I think some people tend to ID from just first pictures alone and may not even look at the additional ones. It might be a time-saving thing again, or new identifiers lacking experience and putting an ID on something they recognize in the first picture. I always look at all pictures on an observation before adding or confirming IDs and I think it’s important to leave comments to alert others if there is an issue like multiple organisms.

5 Likes

If you mark it, it won’t go to RG and won’t be seen, if it’s already RG when you find it - tag other iders and if there’re too many species ids, first mark it as “can be improved”, so it goes to needs id first.

1 Like

I am definately guilty of IDing from the first picture only in many cases… I often ID species that I can confirm by tumbnail alone and don’t even open the observation (e.g. most recently I went through a lot of unmisunderstanding Argiope-obs) and yes, it saves me a hack of time during IDing.
I am always happy to go back on my IDs if I get tagged for some reason, though.

I have learnt to be very wary of any obs with multiple photos. Even long term users sometimes miss That photo which should have been a standalone.

1 Like

I also miss those observations with multiple species. I’m sorry. It is a problem, but easy to do when IDing fast.

3 Likes

This is a huge problem for microscopic organisms; I would advise against trusting the iNaturalist suggested ID for anything microscopic. This is an area where even experts sometimes have difficulty placing an organism in the correct kingdom (yes, kingdom!) e.g. “algae” - a term applied to organisms in the completely separate kingdoms of plants, bacteria, and chromists, yet often difficult to visually place! Research grade IDs are a bit of a rarity among us microscopists on the site, as often made by the ignorant (then confirmed by the naïve who think “agree” is a “like” button) as by the experts. And there are so few of us rigorous identifiers in that sector. So the algorithm has a very small set of poor quality data to work with in making its suggestions.

In short, to more directly answer your question: the smaller the dataset, the more detrimental the poor identifications.

2 Likes

As for the Ageratina split, according to http://www.efloras.org/florataxon.aspx?flora_id=1&taxon_id=250066013 it is “Phyllaries 3–5 mm, apices not cuspidate” versus “Phyllaries 4–7 mm, apices cuspidate to acuminate”. Browsing through some of the observations, a lot of them look acuminate to me whether or not they are in the right geography, so I guess this is a case where I’m not sure how good field ID can get (and/or someone can do the ID, just not me).

1 Like

I don’t know how it works on the app, but in my browser, simply moving my cursor onto the observation will cause it to divide into thumbnails of the first four photos. That way, I can see at a glance when there are multiple species involved.

1 Like