Sorry, I don’t recall any discussions about the previous model being inaccurate or that it was infamous. Can you provide a little more context?
Alex, while I’m not sure exactly what Silas is referring to, have you seen this thread? https://forum.inaturalist.org/t/possible-increase-in-cv-errors-around-organism-range-location/34411/
Hi @egordon88 - yes I have seen it. Two of the three observations mentioned in the initial post were made with Seek. Seek uses a model trained a few years ago, and does not incorporate any seen nearby or geo awareness in its suggestions.
So I’m not really sure if the criticism in that thread really applies to the previous or current models. As I mentioned in the blog post announcing the new model, we’re still working on compressing the new models to get them working on-device, like for Seek, but that isn’t ready yet.
I thought the previous model was rather good actually. It is this new one that I am suddenly seeing a lot more “totally wrong” suggestions with. Has anyone else experienced this? It feels like it is trying to be too specific and giving a wide range of low confidence species with some noticeable outliers (e.g. I had a fish suggested for a butterfly!), rather than giving a higher confidence suggestion just to say a genus or tribe level. Or if you like, there is a distinction between an algorithm designed to supplant humans, and one designed to support them. I would hope it is more the latter that is being aimed for. Maybe the current approach is becoming over-constrained as more species are added. I wonder if there is an adjustment to weightings that can help smooth things out a little, e.g. by adding more significant negative weightings to results relative to how far away from the correct answer they are on the taxonomic tree, and how far up the suggestions list they are appearing.
It would be really great to have some specific examples. If you don’t want to share them publicly I’d be happy to take a look if you send them to email@example.com. But without specific examples it’s not possible to make an informed response or investigate further.
As Alex mentioned in his blog post, we are working on a better way to incorporate location into CV suggestions. Currently, here’s how “seen nearby” works. There’s no negative weighting involved.
Previous one became worse here and new one is kinda even worse, for most observations for me it doesn’t find anything locally or doesn’t know what to choose, ranging from birds to plants, it’s probably number of species or how it was changed the previous time, but definitely it’s not as good as common species as it was before.
Here’s the butterfly example I mentioned. Compare suggestions that come up for the first image (good), vs. those “more interesting ones” that come up for the second image :)
Both of the same butterfly in the same location. In the second screen shot that is the end of the list, and there is no option given at the end of it to switch on/off the nearby filtering.
I will send you the original images by PM. The GPS location data is embedded into them.
And in case it wasn’t clear, the point I mentioned about negative weightings was intended to be regarding how the model is built and optimised in the first place (i.e. as part of the definition of what a “good” model is if you like), rather than trying to add them as some kind of post-filtering to the results presented to users (which I don’t think would be a good idea).
Thanks for all your support!
P.S. The above issue disappears when loading suggestions after the observation has been submitted:
I always assumed that the suggestions for an observation were simply based off the first image, but there appears to be a difference compared to what is given on the upload screen as shown above. Does it take into account all of the images attached to an observation??
Here is another example I came across recently where the top suggestion is not at all close, and appears to have led the submitter astray:
Here’s a very strange one I just encountered:
And when I turn off seen nearby, a few moths do appear, but also some other very strange options (bird and possum):
Even more bizarre:
For that one, turning off seen nearby gives 9 brown moths and the orchid
The suggestions change depending on existing IDs. So if you test the CV options on an observation with no ID, vs one that has an ID, the options will change
Repeating the above experiment with the extension to show a colour scale for the CV score gives no colour at all for the perch, which makes it look all the more like a bug to me:
And the same happens for the other odd suggestions that appear if I remove the nearby filter:
as a practical matter, i kind of don’t understand why you care about anything that shows up below the first 2 items on the list. is this not Arglais io? i would guess that you should interpret the situation here as the first species-level suggestion being such a good match that everything else is sort or irrelevant. if you have a 100 points to divide up, and the best choice scores 99, then everything else splits up the remaining 1 point.
to me, some sort of visualization of scores, as noted in the feature request thread you’ve been commenting on, would help folks to see these situations more plainly (as the browser extension does), but i suspect there’s a reason why the staff have chosen not to implement a visualization, since it hasn’t already been done.
I agree that this particular example is of little practical importance in isolation, but it demonstrates that there is something weird going on in the system which would be good to understand. And as other examples show, there are some strange results also coming up for top suggestions.
Is there any possibility that the model can ever output negative CV scores, in the same way that might crop up any time you try to fit low value data with a continuous function? I was wondering if negative scores being interpreted as absolute values in the sorting process (resulting in them being ranked higher than other low positive values) could potentially explain the behaviour seen.
The CV is matching the stripes on the perch to the striped patches on the butterfly wings?
I would like to be able to downvote the perch (yes it looks similar, but it is not perch)
my understanding is that any individual score will fall between 0.0 and 1.0, and the aggregate of the species-level scores will be 1.0.
Mostly not here, maybe? At least on Discord there have been. Also, on BugGuide. Mostly just stuff about how the previous model was IDing flies as click beetles…
Thanks for the update though… it’s much better.
@alex This is the kind of thing I’m talking about, what with the previous one (and I guess this is the new one?) being inaccurate…
Probably a bit late as answer but i presume the model works like this one
How do you compare between models ?
Is it possible to show the difference and achievements of the different models in a graph ? I thought in the past there was an article about it.
For example a graph for plants with model 1.1, 1.2 and 1.3
It seems that common species are better recognised than rare species
But i just do not understand the x-axis. The x-axis above is logaritm and shows the amount of species above this recognastion. As you can see the 80%, 20% rule applies. Is is easy to recognise 80% of the occurrences as those observations only contain 913 species. The remaining 20% of the occurences, observations contain about 2500 species. So the model already works good with a small but abondant amount of species in the model as these species are the abundant ones.
Interested to see what you add after #GSB22
And hope to see a visible bump in obs numbers too.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.