Currently, if a genus or family (or maybe internodes as well, I’m not sure) has no species in the CV model but enough photos in the genus or family to be included in the CV model, then the model will include that genus or family. I think that’s what’s going on here. The genus Panthera has a bunch of child species in the model, so it won’t have a geomodel itself.
First thing to remember is that this is a model that shows where the taxon is expected to be, not strictly a map of where it has been observed. This model is based on iNat observation data for that taxon, as well as elevation data and it also analyzes which species co-occur and uses that as part of its prediction. There’s a more detailed explanation on this blog post and in its comments.
I’m not a coder but I think it’s more about keeping the model code pretty simple and accepting some mistakes like this. It’s also an intentional decision about which geoscore threshold to use when including a taxon in the model for that hex. If the threshold was really low, then more hexes would be included in the model. If the threshold was high, then the number of hexes would be fewer and iNat would be less likely to predict that an organism is expected to be in a place where it does exist. Nothing’s perfect, it’s basically tweaking a set of dials to try and make the most useful model. IMO, since the model is mostly use to help witih computer vision predictions, I don’t think false positives in the ocean are particularly detrimental - the odds of someone making on observation of * Armadillidium vulgare* in a fully oceanic hexagon would be very low.
However, now that the geomodel is exportable as the Open Range Map Dataset, I think it would be worth investigating hwo to make it more clean for situations like this, since it’s not just being used to aid computer vision predictions. It can and will be improved.
I suspect this is due to co-occurring species. If the same non-Jaguar species in that Cuba hexagon co-occur with Jaguars on the mainland, that might cause the model to predict that jaguars are also in Cuba.
Okay, fair enough - I was thinking of more general (and older) datasets, not sure why. My question may not be valid in this discussion, but I think it’s still interesting to think about - is updating of range bidirectional, or only ever expanding?
You said the vast majority for hemiptera, are some marine? If it isn’t 100% of them the solution would not be simple.
I also don’t think it is a huge problem we get the idea of the range and anyone who reads the species description would know the water is a mistake. Allowing certain curators to touch up the maps would be helpful though.
I dont see how it wouldnt be simple. Set all hemiptera to have oceanic cells be removed automatically, then go back change Halobates so that oceanic cells are not flagged for it.
I suppose its not nessecary for people just using the range map casually, and having enough common sense to know the ocean is a mistake. But what if you want to use the range maps for some kind of ecological study?
I think that someone doing that would understand that no taxon occures everywhere in its range map. For example, we might see a solid range map of a woodland species simply because we don’t have the fine-scale resolution to show it as a matrix of woodland and open habitats; a good researcher would understand the map to mean “found in suitable habitats within this area.”
Yes, but there’s an expectation that there:
a) is likely to be some suitable habitat in the area selected by the model and
b) the organism is likely to be somewhere in the area selected by the model.
This is not the case with a terrestrial organism being predicted to be in the ocean.
As someone who has worked with these types of data, I can affirm that range maps or models that have large spatial errors (both under/overpredictions) can cause problems in downstream analyses. When working with large scale analyses of many species, users can’t check each map “by hand” - it defeats the purpose of being able to do a large analysis.
In this case, a researcher could clip all terrestrial species to a known outline of the world’s land masses or similar, but this is an extra step. It would definitely create a more useful dataset if this issue were addressed at the source (ie, in the modeling process itself). This might also lead to improving the model overall in other ways - if there’s an error in model causing it to predict terrestrial species to occur in the ocean, the same issue might be causing incorrect predictions in other terrestrial areas.
In case anyone is wondering, yes the reverse phenomenon exists. Here is a marine sponge, Halichondria bowerbanki, having its range mapped onto vast swaths of land.