iNat misidentifies Xysticus as Bassaniana

I’ve been IDing spiders from North America for several years now, and have to use much of my time fixing the ID of Xysticus sp. spiders that iNat suggests to observers are Bassaniana. It’s not just here or there, it can be dozens per day. The shape of spiders from these genera are similar but distinctive enough that iNat should be able to be trained to tell one from the other at the genus level. Any plans to train the computer vision on these genera?

1 Like

There are few/no Xysticus that the CV recognizes. The few Xysticus now in its CV are distinctive species that look little like the more common Xysticus species. Since the CV only is aware of leaf taxa, it cannot simply learn to recognize the genus Xysticus once species from that genus have been learned. This will continue to be a problem until someone dives into the Xysticus literature and discovers how to ID them from photos.

1 Like

Not technically a bug, unfortunately.

See the discussion here about a similar case: https://forum.inaturalist.org/t/are-genus-level-rg-observations-used-for-cv-training/63859

3 Likes

Yes, I’ve moved this from “Bug Reports” to “General” since it isn’t a bug. Bug Reports should have the template filled out regardless.

1 Like

You can read more about the current requirements for which taxa are included in the computer vision model here: https://help.inaturalist.org/en/support/solutions/articles/151000170368-which-taxa-are-included-in-the-computer-vision-suggestions-

There are currently 11 Xysticus in the CV.

https://www.inaturalist.org/observations?expected_nearby=true&subview=table&taxon_id=61903&verifiable=any&view=species

Only 2 Bassaniana

https://www.inaturalist.org/observations?expected_nearby=true&subview=table&taxon_id=250040&verifiable=any&view=species

The bottleneck is Bassaniana. Whether in the US or not. Best bet is getting more Bassaniana in the CV.

*The links above need to have location be global, unless your interested in what CV taxa occur in a specific region.

1 Like

Is this a fixable situation?

No, it is inherent to the system. The only way to change is to advocate to staff for a change to how the system works. Or i suppose suck it up and try and make do with the cards your dealt. Do what you can where to improve the CV within how the system opperates.

Though there is a roundabout way the genus can appear. The iNaturalist next update (the latest updated version of how the CV UI works) limits the usefulness of this work around. So if the site gets updated eventually to match iNat next. It will be even harder to get it to just show a genus suggestion.

2 Likes

If iNat would offer us the taxonomy - and make it easier to find and choose the appropriate level.
If there are multiple IDs I can use the Community Taxon link. Otherwise I have to go to the taxon page. (I am a generalist identifier, and there is always a new to me slice - first obs on iNat, made the taxon specialist happy)

Would be good to have a taxonomy tab to click next to Compare.

Sometimes the ID has text for the Family. Other times it doesn’t. ??

Bassaniana isn’t the bottleneck. There are 5 species of Bassaniana, only two of which are common (and several of which may not be true Bassaniana, which is ironic because Bassaniana was incorrectly split from Coriarachne, but I digress). There are a little short of 300 Xysticus species, many of which are common. Bassaniana and Xysticus are extremely easy to differentiate with somatic characteristics. However, the Xysticus currently in the CV are overwhelmingly a light tan color, while the Bassaniana in the CV are dark. This has led to most dark colored Xysticus in North America receiving a CV suggestion of Bassaniana.

3 Likes

Apologies, im not familiar with these. It sounds like then dark Xysticus are not represented. So that is where improvement would be needed the most in this case.

1 Like

How feasible is fixing this? Surely with 300 species, somewhere in the world one dark one is IDable. What about isolated places with few species like islands?

Appreciate all the work by @ljr2018 and @natev on all of the spider IDing!

Realistically, I don’t think this kind of problem is solvable. It may improve but it will never really be “fixed.” Spider taxonomy is constantly in flux, there are too many similar-looking species and genera, many species are variable in appearance, and there are not enough people studying spiders in general. Most of those who do study spiders are not on iNat. Most people who work with specimens are (justifiably) not willing to put speculative specific IDs on photo-only observations - outside of some few distinctive species which mostly don’t need expert review anyway. The Computer Vision will always assign an ID of the closest-looking of the short list of species it has in its model. The 2 approaches usually recommended (which may work for some taxa) are:

  • Go through every observation and put a probable ID on every photo of every maybe-distinctive species, such that you train the CV to “recognize” those species
  • Make hundreds of your own observations of in-situ animals (preferably with a variety of cameras), which you then take back to the microscope and attach a proper determination to. Alternately volunteer to receive and examine specimens from other iNat users. Same goal, though - brute-force the CV into “recognizing” these species.

You still end up with the same problem - the CV will now use those new names for any vaguely-similar-looking specimen since it still only covers a small % of the known species. As soon as the CV learns about a new species, it starts attaching that ID to 100s of observations that are kinda the same shape and color. The Computer doesn’t know what it doesn’t know. I guess in an ideal world there would be 100s of entomologists crawling through fields and forests with their cameras, with the express goal of helping train iNat’s Computer Vision. But I’m not sure how realistic that is. Discussions have been had about the ability to modify the CV behavior on a per-taxon basis, but it seems that is fundamentally incompatible with the design of the image recognition process. It would be great if a curator could put a flag at the genus level (or subfamily or subgenus or whatever) that stopped the CV from making more specific suggestions - that would be very helpful for people who curate/ID many groups of diverse/cryptic organisms. But it doesn’t seem like it’s possible or even desired.

My personal takeaway has been that it’s ultimately not worth my time to trawl through the large buckets of unidentified spiders on iNat. I do casually browse observations in my own area, where I have more familiarity with the local/regional fauna and can sometimes make more informed guesses. I also try to encourage observers to a) collect specimens or b) learn how to make these determinations on their own. I have had some success with the former and have gotten maybe 10 people to send me material. The latter is rare since most people don’t actually care that much about invertebrate anatomy/taxonomy, but I think I have been able to encourage a few people to dig deeper and I hope that I can help spur some interest in these often-overlooked animals. Encouraging people to look closer at “bugs” seems to be more practical, and more broadly beneficial to nature/science, than trying to fight against the latest CV model.

In general the CV is pretty fantastic for many groups, especially things like plants and birds and butterflies. It just doesn’t work well with diverse/near-cryptic groups, especially small arthropods. And there’s no way to tell it “This genus actually has 38 species and only 2 of them are visually distinctive, so just leave the ID at genus” - it will always pick one of those 2 species. So I think there will always be some frustration when trying to curate those groups.

6 Likes

This only contributes to the shortage of identifiers, which in turn exacerbates the problem with the CV.

1 Like

INaturalist needs to change. This issue will only get more and more compounded as the user base grows. No amount of identification is going to change the biology of organisms. Some are just not really IDable from photos.

One thing that i have thought of, that shouldnt need everything to be redesigned is getting community involvement with the system. To be able to flag and limit taxon the CV can learn after community discussion. Nothing is going to be perfect. But there absolutely are some taxa the CV just shouldnt learn.

But Ideally, a system that allows higher taxa to be learned would be a bigger improvement.

3 Likes

If there are more Xysticus than Bassaniana, then shouldn’t it be more likely for Bassaniana to be misidentified as Xysticus than Xysticus being identified as Bassaniana?

Even though arthropods are my personal area of interest (and, IMO, some of the more valuable observations), I can understand the staff motivations to not make changes just to cater to these specific groups. While the CV has many problems with spiders and things, it is great at 90% of the things that people post on iNat. I have to keep in mind that the point of iNat is to get people outside and interested in nature, not to create valuable data sets - or really even accurate data at all. So while some type of taxon-level CV toggle might be a big wishlist item for me personally, none of this stuff particularly matters to iNat’s mission. We are working within a relatively small corner of a large global biodiversity database, and I think just have to accept the tradeoffs. Or don’t accept them and ragequit the site, which unfortunately is the avenue a lot of experts seem to be taking (not to rehash that topic but it just happened again so is on my mind) :(

3 Likes

There are now two features requests to address this problem:
https://forum.inaturalist.org/t/allow-some-non-leaf-taxa-to-be-added-to-the-cv-model/63937
https://forum.inaturalist.org/t/allow-for-genus-level-cv-training-sets-irrespective-of-species-level-participation/63938

2 Likes

One of a big thoughts of mine is maybe less experts / identifiers would rage quit the site if the community could have more say in the CV. Also changes to make the CV more accurate should benefit everyone. Identifiers and observers.

Why would having a more refined and accurate CV make people less interested in nature?

I have contemplated making some for a while now. But sometimes you have so much to say, it’s hard to start.

1 Like

I would question that. It is true that the CV is good at taxa such as birds, mammals, vascular plants (provided they are in flower and not hybrids), butterflies, ladybeetles, and odonates and these taxa do make up a pretty large percentage of observations on iNat.

But there are large swathes of arthropoda where it performs very poorly: myriapoda, much of arachnida, much of diptera, bees (with the exception of Apis and Bombus) and pretty much all of the parasitic hymenopterans, some of the more difficult beetle families, etc. Many fungi/lichens, mosses, and probably a lot of gastropods also fall into this category.

Bees, spiders, and flies are common subjects of iNat observations – people encounter them regularly in everyday life and have just enough familiarity with them to be curious about learning more. Bees in particular are reasonably charismatic and often thematized in the context of concerns about pollinator loss. It seems to be increasingly common for people to use iNat for pollinator monitoring projects, often without more than very rudimentary knowledge of the taxa they are observing and often without any more equipment than a cell phone, meaning that they are relying heavily on the CV.

This is not a niche problem and the number of observations affected are not negligible. (Over 3 million observations of bees worldwide, with only a little over half that are RG; diptera 5 million, with less than 2 million RG; arachnida nearly 7 million, with less than 3 million RG, etc.)

4 Likes