Geomodel issues - Common Pill Woodlouse (Armadillidium vulgare)

https://www.inaturalist.org/geo_model/56083/explain

In the past few days I’ve noticed some new observations with the CV incorrectly suggesting this species as “visually similar” and “expected nearby” I’m quite used to similar looking species being put as visually similar and I don’t see anything wrong with that really, but these were in some odd locations where I have never seen them being suggested before, and they have never even been sighted in that region.

I put this as a flag on the taxa and holyegg showed me the page with the geomodel predictions and suggested I make a post to raise the issue here.

After looking at the geomodel predictions for this species, I’ve noticed many more errors, not only the area which I initially was looking at. Heres a few screenshots compared to actual observations:

Not only is the model suggesting this species to be present in Israel, Egypt and Jordan as I previously noted it has never been sighted before, there are also large expanses of desert and sea where it cannot possibly be present (being a terrestrial species that lives in humid areas). Especially confusing are the large areas of India, China, Nepal and Bhutan around the Himalayan mountains where it has not even been sighted nearby. There are also some mountains next to Tehran where they should probably present since they were sighted nearby but that area is not suggested, but the surrounding desert and sea is?

This all seems very confusing to me, I’m not sure how or why this would happen. I’m very grateful that we have this tool and for the hard work by the team, though this does seem to be a rather large flaw, and perhaps an opportunity to refine the current system. It may help to have the model take into account some basic things about each species and the local climate, in this case it is fully terrestrial, is usually found in humid temperate regions, and has very poor dispersal ability (except that humans have helped it spread from its native range in europe). Perhaps more simplified information would be easier to implement, I don’t know.

Your thoughts or accounts of similar experiences would be greatly appreciated.

2 Likes

IMHO, the geomodels are still terribly oversimplified and are not worthy of expending much intellectual energy worrying about them or analyzing them in detail. From a biogeographic standpoint, they provide little of interest. Until/unless the geomodels become more sophisticated and incorporate more environmental variables, they can be viewed as just fanciful representations of conceivable ranges.

2 Likes

While I would agree, they have a great impact on what the computer vision suggests. So if they are very wrong and if you are trying to reduce mis-identifications, it is often something you must think about.

6 Likes

Indeed, I wish to have the issue resolved because it is causing many misidentifications recently, especially within the past couple of days

2 Likes

You, @zoology123, and I are all in agreement about the headaches that the geomodels are causing in CV suggestions. And I know that staff is aware of the shortcomings of each of the recent and past models that they have used as a basis for CV suggested IDs. I wish I could be a “fly on the wall” in staff discussions about what the next steps are and where things are headed. I am left a bit unsatisfied when I read each month’s new CV training results that is posted on the iNat Blog. Getting trained on more species is great, but if the foundation for the data manipulation is flawed to a smaller or larger degree, the outcomes will not improve.

4 Likes

Would be great if they could include average temperature. I’ve seen too often things expected nearby for Antarctica. Things seem heavily skewed to elevation.

3 Likes

One solution that those of us in North America are aware of would be to overlay the immensely useful “Ecoregion Maps” (at Level III or IV mapping) which were constructed on multiple environmental variables. These would work much better–for North America–than the current geomodel. The problem, of course, is that there do not exist equivalent datasets or maps for all other parts of the globe. Such data may exist on a country by country basis or regionally, but integrating these varying ecoregional data sets into the one CV model is probably a monumental task. I can’t speak for staff nor to the technical challenges of such an effort, but I sit here on the North American continent with that readily available data source and there seems to be no movement to access it.

1 Like

I don’t know if these comments are all still relevant, but here’s what Scott said when the geomodel first came out:

a huge part of the strength of this approach as opposed to a single species niche model is that the species learns from all other 80k species being modeled (much like the Computer Vision Model) so the model gets a good sense for co-occurrence, biogeography and the kind of things species distributions tend to do without having to rely so much on environmental covariates alone as a crutch as traditional niche models do. This is why the predictions are pretty good using just elevation as a covariate and not including other typical covariates like precipitation etc. We tested adding those covariates and didn’t get significant improvement but made the model more complicated.

Personally, I’m not sure I believe that “crutch” is the best way to describe environmental covariates, but I’m not the one running the models :woman_shrugging:

3 Likes

Thanks for sharing this. For species such as this which are common and well studied, do you think it would be possible to allow some users or curators to manually adjust the geomodel predictions to fit current understanding. I think that would be a simple alternative to adjusting the model itself. Also is it possible to have a family level identification included in the CV and geomodel system? Since there are many undescribed species in a certain family which I assume are not considered because that family is not often recommended despite the large numbers of observations at the family level (seems to be only recommended in some regions where there are species/genus level observations).

1 Like

For example, two woodlouse families Armadillidae and Eubelidae. Armadillidae with over 10,000 observations (3000 at family level): https://www.inaturalist.org/geo_model/85603/explain doesn’t have a geomodel, wheras Eubelidae with under 400 observations (160 at family level) does https://www.inaturalist.org/geo_model/475299/explain this is currently causing a good proportion of woodlouse observations in tropical regions to be initially labeled as Eubelidae while Armadillidae are neglected and often get labeled as something completely unrelated.

Perhaps this is due to genera and species within the family having 100+ observations then that overrides the family level model effectively deleting it? If these could be manually updated then I believe this issue could be effectively resolved.

Allowing user/curator intervention to adjust the model predictions would create a different type of output. It would no longer offer an objective predictive basis that is replicable across many taxa. (It might be a better fit to real world data for those taxa addressed in such a manner, but that just becomes a circular self-fulfilling exercise.) Since the available user/observer knowledge base varies widely across the breadth of Life addressed on iNaturalist, the “predictive” basis for subsequent CV suggestions would likely become very uneven across taxa. I can’t speak for staff, but there’s a balance to be struck here between uniform, objective modeling across taxa and modeling individual ranges based on our human-mediated knowledge base. Does that make any sense?

3 Likes

Indeed, I understand your point. Though, in my opinion the CV suggestions are already very uneven and biased towards common and easy to identify species, while neglecting those which either are hard to effectively ID from pictures or lack the expert identifiers here on the site. Manually editing some groups, allowing for higher level taxa IDs to be suggested by the CV in some locations may provide a chance to balance out the suggestions.

My issue is that native/endemic species in many areas which are hard or impossible to ID to species due to lacking literature are being suggested by the CV as more common introduced/cosmopolitan species which look similar, because the native species dont exist in the CV and have their predicted range reduced to where there are species level observations in that family/genus (leads to neglecting entire continents and thousands of family/genus level observatons) while more common species have their predicted range to expand far beyond where they are present and then replace the native species in the CV suggestions or no good suggestion is able to be given at all.

Thanks everyone for you input and ideas, I think the main issue is not the incorrect range for this species, but rather the lack of suggestion for the taxa that are actually relevant to the observations I was looking at. I’m going to make a feature request that should hopefully solve this without too much radical intervention.