Hi @egordon88 - yes I have seen it. Two of the three observations mentioned in the initial post were made with Seek. Seek uses a model trained a few years ago, and does not incorporate any seen nearby or geo awareness in its suggestions.
So I’m not really sure if the criticism in that thread really applies to the previous or current models. As I mentioned in the blog post announcing the new model, we’re still working on compressing the new models to get them working on-device, like for Seek, but that isn’t ready yet.
I thought the previous model was rather good actually. It is this new one that I am suddenly seeing a lot more “totally wrong” suggestions with. Has anyone else experienced this? It feels like it is trying to be too specific and giving a wide range of low confidence species with some noticeable outliers (e.g. I had a fish suggested for a butterfly!), rather than giving a higher confidence suggestion just to say a genus or tribe level. Or if you like, there is a distinction between an algorithm designed to supplant humans, and one designed to support them. I would hope it is more the latter that is being aimed for. Maybe the current approach is becoming over-constrained as more species are added. I wonder if there is an adjustment to weightings that can help smooth things out a little, e.g. by adding more significant negative weightings to results relative to how far away from the correct answer they are on the taxonomic tree, and how far up the suggestions list they are appearing.
It would be really great to have some specific examples. If you don’t want to share them publicly I’d be happy to take a look if you send them to firstname.lastname@example.org. But without specific examples it’s not possible to make an informed response or investigate further.
As Alex mentioned in his blog post, we are working on a better way to incorporate location into CV suggestions. Currently, here’s how “seen nearby” works. There’s no negative weighting involved.
Previous one became worse here and new one is kinda even worse, for most observations for me it doesn’t find anything locally or doesn’t know what to choose, ranging from birds to plants, it’s probably number of species or how it was changed the previous time, but definitely it’s not as good as common species as it was before.
Both of the same butterfly in the same location. In the second screen shot that is the end of the list, and there is no option given at the end of it to switch on/off the nearby filtering.
I will send you the original images by PM. The GPS location data is embedded into them.
And in case it wasn’t clear, the point I mentioned about negative weightings was intended to be regarding how the model is built and optimised in the first place (i.e. as part of the definition of what a “good” model is if you like), rather than trying to add them as some kind of post-filtering to the results presented to users (which I don’t think would be a good idea).
P.S. The above issue disappears when loading suggestions after the observation has been submitted: https://www.inaturalist.org/observations/132110740
I always assumed that the suggestions for an observation were simply based off the first image, but there appears to be a difference compared to what is given on the upload screen as shown above. Does it take into account all of the images attached to an observation??
as a practical matter, i kind of don’t understand why you care about anything that shows up below the first 2 items on the list. is this not Arglais io? i would guess that you should interpret the situation here as the first species-level suggestion being such a good match that everything else is sort or irrelevant. if you have a 100 points to divide up, and the best choice scores 99, then everything else splits up the remaining 1 point.
to me, some sort of visualization of scores, as noted in the feature request thread you’ve been commenting on, would help folks to see these situations more plainly (as the browser extension does), but i suspect there’s a reason why the staff have chosen not to implement a visualization, since it hasn’t already been done.
I agree that this particular example is of little practical importance in isolation, but it demonstrates that there is something weird going on in the system which would be good to understand. And as other examples show, there are some strange results also coming up for top suggestions.
Is there any possibility that the model can ever output negative CV scores, in the same way that might crop up any time you try to fit low value data with a continuous function? I was wondering if negative scores being interpreted as absolute values in the sorting process (resulting in them being ranked higher than other low positive values) could potentially explain the behaviour seen.
How do you compare between models ?
Is it possible to show the difference and achievements of the different models in a graph ? I thought in the past there was an article about it.
For example a graph for plants with model 1.1, 1.2 and 1.3
It seems that common species are better recognised than rare species
But i just do not understand the x-axis. The x-axis above is logaritm and shows the amount of species above this recognastion. As you can see the 80%, 20% rule applies. Is is easy to recognise 80% of the occurrences as those observations only contain 913 species. The remaining 20% of the occurences, observations contain about 2500 species. So the model already works good with a small but abondant amount of species in the model as these species are the abundant ones.