Psst - New Vision Model Released!

tiwane · March 18, 2020, 6:40pm

iNat released a new vision model for our website and mobile apps (it’s not in Seek yet), and we just put up a lengthy blog post about it with some sweet charts and graphs. Take a look!

schoenitz · March 18, 2020, 7:21pm

Typo?

tiwane · March 18, 2020, 7:25pm

Thanks! It’s been fixed.

aisti · March 18, 2020, 7:28pm

Are metrics published anywhere?

gmontgomery · March 18, 2020, 7:29pm

Awesome, congrats iNat team!

schoenitz · March 18, 2020, 7:29pm

Alright. I suppose we’ll have to see how it holds up in daily use. The graphs in the post are nice (though that accumulation curve is itching for log-log axes), but wouldn’t it be more interesting for the users to show some kind of reliability metrics, and how this changed over time from one iteration of the model to the next? Is this hard to do on your end?

alex · March 18, 2020, 7:39pm

Here’s the accumulation plot with log-log axes.

pleary · March 18, 2020, 8:13pm

As for model accuracy, we did a comparison of the model released this month to the previous model released in June based on 50k photos taken since October, randomly distributed in place and time. In that comparison this new model was about 3.5% more accurate than the old model, predicting the correct taxon first 75.3% of the time, predicting the correct taxon in the top-5 81.1% of the time, and in the top-10 83.9% of the time.

A test with a global dataset like this doesn’t shed any light on where that extra accuracy comes from, but it tells us this new model performs better on average, thus we should be using it over the old one. We dug a little more into taxon and region comparisons when we released the last model.

bug_girl · March 18, 2020, 9:21pm

Awesome! Great job! Appreciate all of the work you all do. :-) You and the rest of iNat’s help should take a break and relax. ;-)

xpda · March 18, 2020, 10:47pm

I am genuinely impressed with the recognition. It must be a LOT of fun to work on! Keep up the good work.

robotpie · March 19, 2020, 1:00am

Honestly we never give credit to the people who work in the backend. Really appreciate the hard work ladies and gentlemen.

Also, could anybody elaborate on what “taxa by number of photographers” mean? I’m having trouble relating that phrase to the brown bars in the first graphic.

alex · March 19, 2020, 1:49am

For those three models, the ones represented by brown bars, a species would be included in the training set if it had RG observations by at least 20 different photographers. The theory here was that if all the photos of a taxon were taken by one individual, that could influence the model to optimize for that photographer’s style as opposed to the organisms visual characteristics. In practice we were probably over complicating things.

For the later models, we moved to simpler criteria: more than 100 photos, We are still concerned about getting a broad variety of photos, but we’re planning to address it in a different way.

alex · March 19, 2020, 2:14am

Here’s another way to look at changes in accuracy. We know that the accuracy of our model on a particular taxon increases as the number of photos of that taxon grows.

Below 50 training images and the accuracy falls off drastically.

So as we get more taxa to roughly 1000 images, we’re going to increase the quality of the user experience. However, we’re also adding new taxa all the time. So we’re not simply increasing our accuracy against a fixed, known competition dataset. It can be hard to understand and quantify the improvement.

With this new model, a user in California might see small improvements, while users in some parts of Asia might see local taxa in the suggestions for the first time, with varying degrees of confidence.

We still have a lot to learn about how to best train and evaluate these models. Suggestions welcome!

JeremyHussell · March 19, 2020, 3:31pm

Figure out a way to train with the location and date. Even something incredibly inelegant and inefficient like adding rows and columns of black pixels to the edges of every image with white pixels to mark the latitude, longitude, and day-of-year would likely make a big difference. Ideally, the location and date would just be direct inputs to the model alongside each image.

On a completely different topic, have you read this? https://distill.pub/2020/circuits/zoom-in/

Would it be possible to generate and share the images which maximally activate the classifier for a few different species? Or even better, dig into the details of how the classifier decides it’s looking at one species or another?

deboas · March 19, 2020, 5:08pm

It would be nice if the models could “learn” from disagreements, by putting special weight on images where an observation was identified with the aid of the AI, but then subsequently identified as something else. In other words, I’d suggest placing some form of special emphasis on cases where the model suggested taxon A, and users later disagreed and identified the image as taxon B. For taxa where this happens a lot, the suggestions could be made more conservative (e.g. family rather than species level).

I agree with Jeremy that some way of better incorporating location information (and date to a much lesser extent) into the process would be an enormous step forward.

Finally, it would be cool if you could use standard annotations (e.g. life stages) to inform the process. The larva of many insects looks very different from the adult, for example, but both can be distinctive. Not sure if this already happens to any extent with the current models.

biosam · March 19, 2020, 5:52pm

Thanks to all involved with this! Glad to hear of the improvements, and nice to see a nod to the difficulties/impossibility of accurately identifying millipedes from photos, re: Tylobolus (side note: the hot new trend in AI over-suggestion in millipedes seems to be calling near everything in Europe “Parajulidae,” a family only known to occur in North America and far east Asia). It would be nice for future models to incorporate more external geographic info to weight suggestions, i.e. rather than simply using nearest iNat observations, use a taxon’s known country, regional, or continental occurrence from authoritative sources, so that visually similar taxa from improbable continents or hemispheres are suggested with less frequency.

kiwifergus · March 19, 2020, 8:07pm

I would imagine that location can be factored in “post-training”, as it would just be a sort order on the suggestion list.

theorickert · March 19, 2020, 11:14pm

One thought about a possible improvement of the AI suggestion process: I believe currently only the first image of an observation is being analyzed. Current results are already extremely good for the bulk of the cases. But for the more difficult ones the use of an existing second image would often be helpful since it would usually have a different perspective. The criterion for analyzing a second image could be the frequency of disagreement between AI and community ID for the taxon. Of course, there would be the question of which of the two disagreeing results should be report.

kiwifergus · March 20, 2020, 12:02am

Cassi corrected me on that… it is not just the first image that it is trained on. I think I mentioned it a few times in a variety of places but was confused by the suggestions only being based on the first image.

bug_girl · March 20, 2020, 1:45am

Thank you! I was wondering about that.

Topic		Replies	Views
New Computer Vision Model Released News and Updates	73	3265	May 20, 2024
New Computer Vision model released! News and Updates	45	5590	October 4, 2024
iNaturalist Updates for February 2023 News and Updates web , android-app , ios-classic-app , seek , monthly-update	10	1284	May 2, 2023
New Computer Vision Model Released - August, 2022 News and Updates web , android-app , ios-classic-app	21	1667	January 10, 2023
What I learned after training my own computer vision model on iNat's data General computer-vision	18	2115	October 14, 2023

Psst - New Vision Model Released!

Related topics