Alright. I suppose we’ll have to see how it holds up in daily use. The graphs in the post are nice (though that accumulation curve is itching for log-log axes), but wouldn’t it be more interesting for the users to show some kind of reliability metrics, and how this changed over time from one iteration of the model to the next? Is this hard to do on your end?
As for model accuracy, we did a comparison of the model released this month to the previous model released in June based on 50k photos taken since October, randomly distributed in place and time. In that comparison this new model was about 3.5% more accurate than the old model, predicting the correct taxon first 75.3% of the time, predicting the correct taxon in the top-5 81.1% of the time, and in the top-10 83.9% of the time.
A test with a global dataset like this doesn’t shed any light on where that extra accuracy comes from, but it tells us this new model performs better on average, thus we should be using it over the old one. We dug a little more into taxon and region comparisons when we released the last model.
Awesome! Great job! Appreciate all of the work you all do. :-) You and the rest of iNat’s help should take a break and relax. ;-)
I am genuinely impressed with the recognition. It must be a LOT of fun to work on! Keep up the good work.
Honestly we never give credit to the people who work in the backend. Really appreciate the hard work ladies and gentlemen.
Also, could anybody elaborate on what “taxa by number of photographers” mean? I’m having trouble relating that phrase to the brown bars in the first graphic.
For those three models, the ones represented by brown bars, a species would be included in the training set if it had RG observations by at least 20 different photographers. The theory here was that if all the photos of a taxon were taken by one individual, that could influence the model to optimize for that photographer’s style as opposed to the organisms visual characteristics. In practice we were probably over complicating things.
For the later models, we moved to simpler criteria: more than 100 photos, We are still concerned about getting a broad variety of photos, but we’re planning to address it in a different way.
Here’s another way to look at changes in accuracy. We know that the accuracy of our model on a particular taxon increases as the number of photos of that taxon grows.
Below 50 training images and the accuracy falls off drastically.
So as we get more taxa to roughly 1000 images, we’re going to increase the quality of the user experience. However, we’re also adding new taxa all the time. So we’re not simply increasing our accuracy against a fixed, known competition dataset. It can be hard to understand and quantify the improvement.
With this new model, a user in California might see small improvements, while users in some parts of Asia might see local taxa in the suggestions for the first time, with varying degrees of confidence.
We still have a lot to learn about how to best train and evaluate these models. Suggestions welcome!
Figure out a way to train with the location and date. Even something incredibly inelegant and inefficient like adding rows and columns of black pixels to the edges of every image with white pixels to mark the latitude, longitude, and day-of-year would likely make a big difference. Ideally, the location and date would just be direct inputs to the model alongside each image.
On a completely different topic, have you read this? https://distill.pub/2020/circuits/zoom-in/
Would it be possible to generate and share the images which maximally activate the classifier for a few different species? Or even better, dig into the details of how the classifier decides it’s looking at one species or another?
It would be nice if the models could “learn” from disagreements, by putting special weight on images where an observation was identified with the aid of the AI, but then subsequently identified as something else. In other words, I’d suggest placing some form of special emphasis on cases where the model suggested taxon A, and users later disagreed and identified the image as taxon B. For taxa where this happens a lot, the suggestions could be made more conservative (e.g. family rather than species level).
I agree with Jeremy that some way of better incorporating location information (and date to a much lesser extent) into the process would be an enormous step forward.
Finally, it would be cool if you could use standard annotations (e.g. life stages) to inform the process. The larva of many insects looks very different from the adult, for example, but both can be distinctive. Not sure if this already happens to any extent with the current models.
Thanks to all involved with this! Glad to hear of the improvements, and nice to see a nod to the difficulties/impossibility of accurately identifying millipedes from photos, re: Tylobolus (side note: the hot new trend in AI over-suggestion in millipedes seems to be calling near everything in Europe “Parajulidae,” a family only known to occur in North America and far east Asia). It would be nice for future models to incorporate more external geographic info to weight suggestions, i.e. rather than simply using nearest iNat observations, use a taxon’s known country, regional, or continental occurrence from authoritative sources, so that visually similar taxa from improbable continents or hemispheres are suggested with less frequency.
I would imagine that location can be factored in “post-training”, as it would just be a sort order on the suggestion list.
One thought about a possible improvement of the AI suggestion process: I believe currently only the first image of an observation is being analyzed. Current results are already extremely good for the bulk of the cases. But for the more difficult ones the use of an existing second image would often be helpful since it would usually have a different perspective. The criterion for analyzing a second image could be the frequency of disagreement between AI and community ID for the taxon. Of course, there would be the question of which of the two disagreeing results should be report.
Cassi corrected me on that… it is not just the first image that it is trained on. I think I mentioned it a few times in a variety of places but was confused by the suggestions only being based on the first image.
Thank you! I was wondering about that.
Not training!! But suggestions, analysig… but after training in the observation. Currently i thought it is using the first photo of an observation for the suggestions and skips all other photos.
I could not find it but it seemed i missed several posts on CV, AI training.
Are there more tips on the way CV, AI works (Cropping can definitely improve results.)
FWIW, there’s also discussion and some additional charts at
about a rare species, but the system might still recommend one based on nearby observation
“nearby” means near in space and time
The model became more efficient in sedges and grasse
the vision model does not itself incorporate non-image data other than taxon IDs
https://www.inaturalist.org/blog/25510-vision-model-updates (“taxon and region comparisons” 20190614)
https://distill.pub/2020/circuits/zoom-in/ (“connections between neurons”)
Sorry, my previous suggestion probably went to the wrong address (it’s on improving the AI suggestions).
One idea that might be more appropriate here is to allow a smaller numbers of photos for selected taxa in the AI training. The reasoning is that some taxa are much more unique than others and this might allow some of the rarer taxa to get in. Just an idea - I don’t know how much trouble that would be, or if that can even work in your process.
there is a thread on a related topic, using the CV to populate annotations
Hi! I had a couple questions related to the new computer vision model and thought I’d float them here (I’m happy to relocate if there’s a better place).
First, is there any particular reasoning behind only running cv (edit: cv prediction) on the first image in a set? I’ve started polling the cv on each photo in the uploader prior to merger, but it’d be nice to be able to access this info after the fact to guide identification of mine and other’s photos.
Second, I’m revisiting my old non-research-grade observations in light of the new cv, and ran into a quandry. If a species I couldn’t previously identify now shows up with a reasonably strong cv suggestion, I’m tempted to add it. If a user previously suggested the species, I think this would promote to research grade. I think this matches intent: The two identifications should be independent, bc sub-RG observations don’t feed to the model except when sub-RG only bc cultivated. Still, it feels a bit weird.
In the first question, CV is trained on all observation photos, but when you are getting a suggestion on an observation it only looks at the first photo. In the uploader, before you merge a group of photos into one card, you can use the CV suggestions for each card independently, so it’s a way to see if any of the photos might offer something different.