Insect plant associations undermining computer vision recommendations

While doing identifications I have been noticing for some time that the computer vision will sometimes recommend host plants for commonly observed butterflies even when the host plant is not present in the observation. I never saw this as too much of a problem, as the butterfly the observer was focusing on is usually the too recommendation. I have also seen this happen with galls and leaf mines, and have read a few threads discussing this on the forum.

Then today I came across this observation. https://www.inaturalist.org/observations/98294472
Which seems to really mess with the computer vision recommendations. The observation has a very common butterfly featured. The host plant for this butterfly are milkweeds, but there are no milkweeds present in the photo. The original observer put an identification of plantae, and when using plantae as the id the computer vision recommends several different milkweed species as the top recommendations. The correct plant species is not recommended at all, even thought the plant in question is typically correctly identified by the computer vision.

3 Likes

That situation is a little more complicated. The observer has duplicated the photo and already has a RG obs for the butterfly. This obs IS for the host plant. PS perhaps not host for eating - but intention is for the plant, whatever it is.

Since she did not link the 2 posts, I have done so via comments.

1 Like

Yes. I do agree that the observer intended the observation to be for the plant. My issue is with the computer vision giving recommendations that are clearly not in the photo.


This is what the computer vision recommends for this observation if the initial id is set to Magnoliopsida. These are mostly plants closely associated with the Monarch, but the actual plant in the photo is not recommended at all even if you scroll to the end of the recommended list.

2 Likes

I wonder how CV would cope if the picture had been cropped to the single plant (without the butterfly and extra noise)
I tried cropping and Google Lens and that goes to the right sp.

CV also likes to ID plants as leaf mining flies:

https://www.inaturalist.org/observations/252868441

https://www.inaturalist.org/observations/252775944

https://www.inaturalist.org/observations/252511588

4 Likes

Yes, that seems to be an issue. It appears the CV has learned to overly confidently suggest milkweed whenever a picture of a monarch is ID’d for plants in it.

That hints that the plant ID suggested on iNat is indeed based on the presence of the butterfly and not the plants in the picture.

3 Likes

the CV doesnt really know what is in any photo. it just knows that the average photo that looks like (this butterfly and this plant) tends to be a given species. that trend of association is not going to change. the question is, maybe, how do we either teach the CV to ignore certain aspects based on our input – or to identify multiple items within an image. Im not sure it is built to do that.

3 Likes

The converse is true as well; any insect on an ironweed flower is likely to bring up a suggestions of Melissodes denticulatus since that bee specializes on that flower, even though many other insects also visit it. I can’t think of any easy fix other than encouraging observers to be thoughtful about taking the suggestions.

5 Likes

Charley Eiseman wrote a post about this issue: https://forum.inaturalist.org/t/leafminers-have-broken-the-cv/54378

2 Likes

One reason why this happens is that a high percentage of the observations of certain charismatic species are poor or distant cell phone photos that can be identified mainly because the focal organism is somewhat distinctive. These observations become research grade, so the CV learns (for example) that a photo with a tiny orange blur on a plant has a decent chance of having a Milkweed species in the photo.

When I raised this over the summer, it was pointed out to me that once someone has made the first ID on an observation, the CV suggestions are constrained to a corresponding top-level taxon. So if someone picks an insect, all subsequent CV suggestions will be insects. Once someone else disagrees and bumps the higher level taxon back up, the CV returns a wider range of suggestions. So @edanko, all of your examples now return more inclusive/correct CV results, after people have added conflicting IDs.

Therefore, I have reluctantly concluded that we see Arctium spp. identified as Liriomyza arctii and so on … often because people are making incorrect choices when they select an initial ID, not because the CV is returning an incorrect suggestion as the top initial result.

I’ve tested several problematic plant taxa in my region, for example:

Liriomyza arctium appears on the list, but it’s not the “pretty sure” suggestion nor one of the first two “top suggestions” (on the traditional iPhone app). I’ve tested this with a variety of closeup and wide shots, and several other problematic plant taxa with similar results.

However, a spot where the CV does genuinely seem to struggle is cultivated plants that don’t get ID’ed to the species level, e.g., the hybrid roses in gardens and landscapes. Here are some examples of what happens with rose pictures, which is probably why I see a lot of cultivated roses ID’ed as fungi in my region.

The first two screenshots are from the traditional app on iPhone, and the third screenshot is from the Next app with the same photos.

3 Likes

I would expect exactely this kind of behaviour from CV, at a certain stage of the learning process. I would also expect that such cases will become more and more difficult to find, as iNat is growing and more observations of monarchs on non host plants become available…

1 Like

In addition, by initially telling iNat that it’s a plant it means that future CV recommendations will stick with plants rather than defaulting to the butterfly.

Nothing at all wrong with this observation and CV is working as it should.

I’m not sure I’m confident in this type of expectation. for this particular example; monarchs are one of the most commonly observed taxa on iNaturalist, with well over 300,000 records, far more than the average taxon. if this type of suggestion/detection error is still happening with some frequency, and if the expectation is that it will be “improved out” by way of future model retraining alone: 1) how many more hundreds of thousands of observations would be needed to overcome it, and 2) can we really expect that the proportion of (still in the same example) monarchs on non-host plants relative to the total number of monarch observations will grow faster than the proportion of monarchs photographed on milkweed?
I guess I just don’t feel that the CV with regards to commonly observed taxa is at any kind of intermediate stage where frequent mistakes of a sort should be perhaps forgiven as “early weirdness”. of course, if the problem is compounded by human mis-selection (I am also unfortunately very familiar with the rose under-recognition problem @djringer mentions) that’s another layer of difficulty – but that tends to get exaggerated by CV behavior to begin with, where novice observers are presented with a menu of dubious choices because of (for example) the vagaries of training data selection (such as the computer vision not being given over to suggesting genera if not supported by Research Grade observations either at species level or marked “as good as can be” at a higher rank).
I don’t know what the solution to all this would best be. I just don’t think that more data and more training will inherently make these errors come out in the wash with time.

1 Like

The CV a) does not (currently) include hybrids, which means that it does not know many ornamental cultivars because these are often hybrids and b) is not separately trained on the genus if at least one species in that genus is included in the CV, so it struggles with photos that do not resemble those members of the genus that are difficult to ID to species (these may be the “typical” ones because the atypical ones are the only ones that are easy to ID).

I struggle with people who cannot say - that’s a rose - without needing CV to help.

How do we make it clearer to newbies? Black spot and powdery mildew are diseases - which are NOT visible here.

2 Likes