Testing the computer vision capabilities: What are valid practices?

I want to test the iNaturalist computer vision model.
In another question, I was told that there is currently no public API for the CV model. I.e., to get species suggestions from an automated script, I need to create observations.
Thus, the question arises, whether and how me creating observations can affect the results.

E.g., when I survey multiple individuals of the same species, under what circumstances will the close-by observations affect the species suggestions? I guess if somebody suggests a species on one of the observations, probability distributions will be adjusted for the CV suggestions on the others?
Is there a way to prevent species suggestions for a certain time but without affecting the predictions? (I guess marking the observations as casual attracts fewer IDs, especially in combination with hiding the location; however, this should decrease the CV accuracy since it would consider a wider set of species, i.e., those that are usually cultivated instead of those naturalised, right?)

Will adding/removing images to/from observations always lead to a fresh, unbiased run of the CV model?

I know there are several studies out there that tested the capabilities of the iNaturalist CV model. Nevertheless, I think it is better to ask technical questions here, since some of the studies were already criticised for disregarding technicalities related to how iNat works and just because somebody had done something in a particular way before doesn’t mean that is the right way.

So I hope somebody with deeper insights on the inner workings of the iNaturalist CV model and interface can point out some pitfalls when trying to evaluate the CV model performance.

In particular, I want to

  1. get suggestions on various combinations of parts of similar plants (leaf, inflorescence, entire plant, etc); those might be of the same individuals and, thus, I thought about uploading observations and editing them subsequently by adding/removing images (all through the API)
  2. make sure that the IDs are independend
  3. make sure to avoid other pitfalls, like the one addressed in this comment.

Only the first photo is checked.


This is something that I have been working on as a side project. I wanted to test how the CV algorithms compare between iNat, Pl@ntNet, Google Lens, and a couple annoying ones with ads (I’ll make a forum topic with my results eventually).

But to summarize my methods: all I did was upload the same photos the different apps and see how they did when they were given the same file (remember not to name the file anything informative as the species name will be extracted from the file name) and recorded what ID was given by the CV. I could then calculate a percentage correct.

When I say upload, I mean I used the upload page on the desktop site, clicked on the species name entry box. I used the first suggested name as the AI suggestion. Make sure you record the ID to your data file before you actually upload.

As was already mentioned, only the first photo is used by the CV to make an ID so don’t add multiple per observation.

I would also not reupload any observations you already have posted to get a new ID, it is possible, if they are more than 2 months old that the CV will have trained on them already, making it not independent.

Hope this helps


Both on the computer and the phone, you should be able to run the algorithm without even saving the observation. So you could add the photo, run the model, and then if you don’t want to keep the observation delete it before even posting and record the results.


I would caution against using just the first suggestion from iNat based on how the CV suggestions work. That was one of the issues with a recent paper that did these kinds of comparisons:


I would suggest searching for and reading through the existing documentation of the iNat CV - there are a fair amount of journal and explanatory posts on iNat itself that will answer some/most of these questions. There are also a lot of existing posts on this forum that can answer some questions as well.

I think the most relevant things are:

As @fffffffff mentioned, the CV works on single photos, not collections of them.

CV models come out periodically (often every month or so). The model should produce identical results during that time for photos - it won’t be biased by anything you upload.

If you don’t actually upload the observation but just check the CV output, then there isn’t an issue.

It’s unclear what you mean by “independent”.

1 Like

I have read the criticism on this paper, which is part of why I thought I’d rather ask than copy some method others used. I guess I can obtain the information about whether the CV model suggested a genus rather than a species, or it just added the genus as additional “I am sure it is this genus” information from the json response to an API request…

Two comments:

  • running a script on the CV may be against the rules on machine generated content, tread carefully
  • as said before, it is possible to have the CV evaluate photos without creating observations, either on the upload page or on the CV demo page

The seen nearby data are updated more frequently than the CV model (or at least that was true before the model was getting updated every couple months), but my understanding is there isn’t a schedule. I believe only the devs could tell you when that gets changed. It’s also worth noting that the nearby time frame is plus or minus 45 days, so if you were to do this over a series of days, the CV model is unlikely to change (and would be announced if it did), but the rolling window may alter which suggestions are nearby or not.


From all I’ve read, I cannot use the API to get “just an ID”, but it will create an observation.
I guess this could already bias the result through some post-CV modification of the probability distribution under consideration of species observed nearby. This is what I mean when I say “independent”: If I sample multiple individuals of the same species at some site, I would obviously not want them to affect each other’s species suggestions. I guess it would be ok, as long as I upload them within a short time frame and read the species suggestions quickly, since the species is not counted as “observed nearby” unless some user IDs it…

On second thought, since it uses the date of the photo and not the date of upload, as long as your photos have the same date, the nearby window shouldn’t affect your results.

why would you want to do this?


I also recorded whether the correct name was in the top 5 suggestions for my study but didn’t mention that here.


To make a sound comparison of the performance of state-of-the-art computer vision models for the flora of a specific region using an entirely new and independent, representative data set of real-world observations identified with high certainty (through expert knowledge and specific literature and without the help of CV, obviously) and photographed using a standardised procedure. A bit of a summary of the current state of the art, indicating strengths and weaknesses of various platforms, as well as general weak points in current computer vision models. Perhaps also identify recommendable ways of how to take pictures and what to focus on.
Then sum it all up in a MS (in whatever format suits the significance of the results, be it a personal note, an iNaturalist journal post, a letter in a local newspaper, a technical report, or a Nature article (since we’ve seen the one published in PLOS One is rubbish (at least according to some people)), idk yet, it obviously depends on how well it all goes).
Why would I not want to do this?

Please add a link here when you publish your results. Most of the articles only have a few photos, which makes the results not so interesting. On the other hand a new computer vision model a month later makes the report already outdated…

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.