Computer vision suggested IDs getting worse for a spider species with a large number of research grade IDs?

I have a project focused on a particular kind of spider (Enoplognatha ovata) that has resulted in a large number of research grade IDs (>5000). In the last month I have noticed what seems like an increased number of non E. ovata organisms being identified as this species, I assume based on the computer vision algorithm suggesting E. ovata at the top of its list. It is a widespread species so it’s probably “nearby” a large majority of observations, but it seems to me the visual similarity of some of these new observations is pretty low (in the past most of the mis-IDs seemed much more reasonable, for a spider with very similar colouration, whereas some of the recent ones are way off, including even non-spiders). Is it possible that an increase in false positives for this species is because it is so common and has so many research grade IDs? Or is there some other reason the suggested IDs could be getting worse?

2 Likes

It could be that with more observations, the photos are more diverse (color, lighting, angle, background, etc.) and thus more like photos of other species. More photos is generally helpful, but if they are all grainy and uncropped, that would actually make it more difficult for the CV. I didn’t look at any of these photos so I don’t know if that’s actually the case, but it’s possible.

4 Likes

It could also be an effect of CNC. If it is a widespread suggestion, many new/inexperienced users could just be selecting a CV option without taking much care and uploading. If there’s a decrease in the next month, that probably would indicate CNC played a role.

6 Likes

It’s definitely possible. Spiders are tricky though. There are things CV just doesn’t really pick up on. For example, often large, hairy spiders are labelled “tarantulas” by CV even though the eye arrangement is off. Or it will label similar Mygalomorph spiders like Diplura and Idiomatta as tarantulas. I try to give folks grace because everyone has their own expertise, and most of the people who post the observations don’t really know that much about spiders. In my mind, it’s on the ID-ers to course correct. Granted, at such a large scale, that’s pretty difficult to do.

6 Likes

I have the impression (no data to back it up) that the rate of correct CV ID’s actually goes down sometimes as the number of plant species covered by the CV goes up. Being confused by similar species, I have guessed.

2 Likes

I’ve been noticing some issues with the CV accuracy in the past few days, and I’m wondering if CNC has something to do with it. I uploaded a pretty clear trail cam shot of a raccoon yesterday and CV suggested a possum, a brown rat, or a mountain lion…

1 Like

If your ID work on this species has caused the # of “Research Grade” observations to increase significantly, it will probably affect the Computer Vision. I see this frequently with spiders - as soon as there are enough RG observations for the CV to “learn” a new species, it will start suggesting that species for any number of mostly-unrelated observations. There have been a few discussions about this recently, which may shed some light on what’s going on. This one might be most relevant:

https://forum.inaturalist.org/t/inat-misidentifies-xysticus-as-bassaniana/63896

Unfortunately, using image recognition to ID spiders (outside of a handful of large/colorful/distinctive species) is asking a lot, and arachnologists on iNat probably will just have to live with the limitations of what it can do.

3 Likes

If someone wrote a Spider 101 journal post - you could link to that in comments.
Is That Spider a spider? I tripped over a harvestman during CNC.
If you offer info, then you can mentor second tier identifiers who are interested in spiders. Offer an URL where - we need help sorting out … so the taxon specialists can concentrate on the interesting obs.

1 Like