Computer vision proposes the same name for many records of orchids

After identifying more that 2500 records of Spathoglottis, beside many approximative identifications, I see that many records of orchids of quite different species have the same name taxon proposed by IA, Spathoglottis plicata. Many records of Phalaenopsis and Dendrobium as examples are wrongly identified as Spathoglottis plicata. As the flowers of that terrestrial widespread species (I understand wild or naturalized plants) throughout the tropical world are highly homogeneous, it seems strange that so different epiphytic species as Phalaenopsis or Dendrobium may be mixed with it. I wonder if the IA may have been trained with erroneous pictures of mixture of horticultural varieties (many interspecific hybrids of Spathoglottis are produced and sold for cultivation), and including the above different commonly cultivated genus. Would it be possible to check this?

3 Likes

@chacled,

Je pense que par “IA”, vous faites référence à l’Intelligence Artificielle. L’acronyme est AI en anglais, ce qui signifie “Artificial Intelligence”.

Your post is valuable, so I don’t want English readers to be confused!

2 Likes

Can you please provide some specific examples of observations where this has happened?

1 Like

When a name is applied to the wrong photos, the AI (or CV) learns that those species are examples of that name. Therefore, it suggests that name for photos that look like all the species it has learned it is. So people pick that name for all those other species, so the problem gets worse.

There’s good news! When you clean up the identifications of the observations labeled as that species, the AI begins to learn what the species really looks like and it improves. Then people pick that name mainly for the correct species and the AI learns more. The situation improves.

This takes time, though, and it takes a person watching observations with that name for a while.

6 Likes

Here on iNat the software that helps ID things is not called an AI – it is called CV – Computer Vision.

2 Likes

To be honest I start finding these ‘corrections’ (not from you specifically @susanhewitt, I know it’s a position that iNat staff has taken) a bit tedious. Computer vision is just a field within AI, so calling the tools AI is entirely correct, just not as precise as it could be.

2 Likes

I’ll take responsibility for that, it was me who really emphasized it and corrected posts, Susan and others are just trying to hew to that example set by me. We’re OK with “AI”, especially since it’s in such broad use now.

8 Likes

This is an issue that’s widespread across many taxa, especially ones with a few commonly photographed taxa in a large genus. The CV might learn one taxa, then apply that suggestion to anything that looks like it without having “learned” that there are dozens of look-alikes. It’s slowly getting better as new species are added to the library of species that the CV can identify. The problem is that there’s a large legacy of incorrectly identified species in the system that need to be corrected, and not enough people to correct them as fast as they pile up.

The CV also doesn’t identify things in the same way that people do. Sometimes it focuses on unexpected parts of the organism, or even the background. For instance, sometimes a photograph of a plant will return an identification of an insect (like Monarch Butterfly) because many observations of that insect have a plant taking up most or all of the background.

5 Likes

this can be ‘overruled’ by first adding a coarse ID to the observation, and the CV will then make suggestions to match that ID. So an ID of ‘Plants’ or ‘Dicots’ or something to that photo will force the suggestions to be of plant species

4 Likes

I’ve been pouring through Graphocephala recently, a genus of leafhoppers. ~50 species in it, and many look like at least 1-3 other species in the genus and a few outside of it as well. Only 11 species have more than 100 observations and most have only a handful. As such, the CV misidentifies these organisms very frequently. Doesn’t help when people confidently agree with an incorrect identification instead of just supporting at genus/subgenus.

It’s tedious to be sure, but necessary. And hey, I’m really learning a lot about IDing this group as I go.

8 Likes

I have recently spent a good amount of time in the past 4 months attempting to fix Chironomidae on the site. When trying to correct the CV, there are quite a few steps one can take. One step I highly suggest as an identifier is trying to target species that are not in the CV, but could be. Going out of your way to ID them, and encourage observers to observe more of them if they can. This should at least for Chironomidae be very effective since most of the misidentifications from the CV is because the CV only knows a tiny amount of taxa out of a few 1000. Maybe like 10 before starting the midge project. The last CV update, 6 Chironomid species were added, and a handful of genera.

10 Likes

Sorry for mixing IA and AI.
Below are a few links of recent identifications. It includes Phalaenopsis, Dendrobium, Epidendrum:
https://www.inaturalist.org/observations/227849879

https://www.inaturalist.org/observations/227684844

https://www.inaturalist.org/observations/227684844

https://www.inaturalist.org/observations/226509983

https://www.inaturalist.org/observations/226689670

https://www.inaturalist.org/observations/226977757

https://www.inaturalist.org/observations/227002482

https://www.inaturalist.org/observations/225333789

One of my dormant projects is to help Cardiff Bristol Museum get their HUGE insect and arthropod historic collections digitised and added to the training dataset for the Inat CV.

Does anyone know if it is possible to add weightings to observations so things like type specimens can help tip the probabilities, despite there only being a few images?

hi @cromlyn, iNaturalist observations are meant to represent your own nature observations, and collections by multiple different people such as those in a natural history museum aren’t really a fit for iNat. To answer your question, it is not possible to add weight to type specimens in the algorithm. Also, since computer vision is trained on photos, not organisms, a bunch of images of dead insects in collections might not be very helpful for identifying those same species in situ in nature.

4 Likes

You may want to look at SCAN or https://ecdysis.org/ for the museum’s records.

1 Like

Welcome to the forum!

I agree with @bouteloua that iNat isn’t a good fit for institutional collections, and staff have made clear that collections shouldn’t be uploaded wholesale to iNat or iNat used as a replacement for collections management software.

There are some previous threads that touch on this like:
https://forum.inaturalist.org/t/institutional-use-of-inaturalist/49050
https://forum.inaturalist.org/t/inat-development-for-museums-or-research-centres-improve-the-value-of-research-grade-data/7743
which also cover some of the issues with attempting to use iNat in this way as well as a couple of rarer use cases (like incidental observations from collections) where iNat usage might be appropriate.

Since institutional use of the CV isn’t the main focus of this thread, if it’s desirable to continue this conversation, we can split it into its own thread or reopen one of those (or another) older one and move it there.

1 Like