Question about characterizing Computer Vision

I have a question about CV’s performance in relation to user-supplied IDs. This is related to the research I am doing on a pair of near-cryptic moth species, discussed at length here:
https://forum.inaturalist.org/t/comparing-cv-outcomes-in-a-pair-of-near-cryptic-moth-species/56793

Is the following a true statement?:

"…the performance of CV both in training sessions and in suggested identifications at Upload is not affected by identifications offered by the original observer nor subsequent user-contributed identifications, nor the Community ID”.

Or is there a different or better or more nuanced way of stating this?

I don’t think this is a correct statement but I don’t have insider knowledge, just want to follow any replies.

performance?
CV will give you a different answer, depending on whether it is …
Unknown somewhere
Or a broad ID to start with
Or a location specified
Or broad ID plus location

But that is not what you mean by CV performance?
I suppose the basic underlying CV suggestion is ‘not affected’ but it surely looks as if it is. The (monthly) updates are a real change.

Not a programmer, but my understanding is that the training is separate from what happens during the process of IDing observations – i.e., the CV can be accessed to provide a list of suggestions for an observation, but the suggested IDs are not permanently stored in connection with the observation. Likewise, there is no direct feedback in the other direction, meaning that the IDs that people actually select for an observation are not communicated to the CV, nor is it informed about corrections of IDs that it suggested.

Indirectly of course, corrections of wrong IDs help improve the model because during the next training the new community ID for those observations will be used. I’m not sure exactly what happens when photos that it was trained on are later ID’d as something else – presumably there is some mechanism for removing these photos from the set of reference images for that particular taxon.

Thanks for helping me focus my question, Diana. So presuming for argument’s sake that an observation lacks a location, will the initial ID by the OP (whether Unknown, LIfe, or anything down to species) influence what CV will suggest? The suggestions will, of course, differ with or without location data, but that is based on CVs learning, not from any ID offered with the single observation. Subsequently, will any added IDs influence what CV might suggest? I believe the answer to both of these examples is “No”–or at least those subsequent ID’s shouldn’t influence the next CV suggestion for that observation. Whether or not an observation has geolocation, CV should be ignoring user or community suggested IDs in the Upload and suggested ID phases. For observations which are geolocated, CV just has more information to work with (from its training sessions). That’s not the distinction I’m asking about. When an ID is solicited from CV, CV should be ignoring any IDs of that observation. Correct?

To the question about CV training, those sessions are explicitly based on RG observations chosen at random. So my question is more complex: Is that CV training influenced at all by the complicated calculations which lead to Community IDs to achieve RG? For instance, an easy-to-identify bird might get two quick correct IDs, reach RG and subsequently be chosen for a training set. But what if an observation has a long and circuitous–even argumentative–pathway but eventually gets to the necessary calculation for RG: Is such an observation, with all its ID warts, treated exactly the same as a two-and-done RG observation? See what I’m getting at here?

I don’t believe this is entirely accurate. My understanding was that IDs on an observation do help filter at least to kingdom, though perhaps not lower. For example, if you put the a photo up with a flower ID, the CV gives mostly flower recommendations, but if you put the same photo up and have a leafminer ID, the CV gives mostly insect recommendations.

This thread had more details about this behavior: https://forum.inaturalist.org/t/leafminers-have-broken-the-cv/54378/2, including a note from Tony that apparently this behavior doesn’t happen on all types of iNat.

1 Like

Correct. As I understand it, when browsing on the observation’s page, the CV is restricted only to suggestions with the same iconic taxon as the community ID. For a concrete example:

1.) I upload a picture of a squirrel in a tree with no ID (‘Unknown’). The CV could suggest either squirrels, trees, or both in a mix.
2.a) I add an ID to ‘Mammalia’. Now the CV should be restricted to only suggest IDs related to mammals (presumably the squirrel, if it is performing well). Note you may have to refresh the page for the behavior to change.
2.b) I remove the ‘Mammalia’ ID and replace it with a ‘Plantae’ ID. Now the CV should only suggest IDs related to plants, presumably the tree.

When browsing through the CV options in the ‘Suggestions’ tab of the identify modal sorted by ‘Source: Visually Similar’, the behavior is a little different, and manually controllable. The default behavior in the identify modal is:

1.) If the current display taxon is species-level, then by default it will only suggest IDs with the same direct parent taxon; for example for a Petrophila bifascialis observation, by default it can only suggest other IDs within genus Petrophila.
2.) If the current display taxon is higher than species (i.e. just genus Petrophila), then it will only suggest IDs for taxa that are within that same taxon (i.e. genus Petrophila).

However, in the identify modal, you can manually edit the taxon to restrict CV suggestions to anything, even if it is not a parent of the current display taxon. For example, say someone posts an observation that is ID’d as an allium, and I do not know what it is except that I am sure it is a dicot. I can then click the ⌄ next to ‘Taxon: Genus Allium’ and manually enter that I want the CV to only give me suggestions for visually similar dicots.

Note that none of this is actually altering the internal behavior of the CV; it is actually only altering the way the results are displayed, by applying a filter that removes any suggestions that don’t fit the taxon restriction criteria. So the rankings should be identical with and without the filter.

I am not sure what you mean by this. If you mean that the current ID does not affect the way that the CV model is executed, either at runtime or at any given training iteration, I think that is true. Of course the current ID matters between training iterations, because the CV model is being trained to replicate the IDs. Another way to put this is that version 2.16 of the CV should always give you the same ranking results for the same picture when run, but will not necessarily give the same ranking as version 2.15.

If that is what you are trying to capture, I might suggest a phrasing like:

“The numbered release versions of the CV are static after release, and do not incorporate any new data added to the site, either from new observations or new IDs added to existing observations. When given a specific photograph to rank, the CV model’s internal ranking of taxon is not directly influenced by either current or past IDs of any observations associated with that photograph.”

I believe that is an accurate statement, and it may more precisely reflect what you are trying to convey.

For example a beetle on a flower.
CV will probably ‘see’ the beetle
If I ID as Flower, CV will see flower. CV does not ignore my flower ID.
And if ‘John says that little beetle is a thrips’ then CV will flip to thrips.
If you want the CV to be neutral - no ID, no location = not very useful.

I am so used to pushing the CV to engage where I need it to, your question is a different way of evaluating my usage.
I actively push the CV to include ‘my’ new species
https://www.inaturalist.org/observations/217272808 (with IDs from 3 botanists in support)

Your published research will have shifting goalposts for CV for future readers, as each subsequent model is updated. It will be important to state up front - using CV model number ## with data drawn from such and such a date.

1 Like

I believe RG is given priority, but observations do not have to be RG in order to be used for the CV training.

I assume it is only provided the community ID without the history of the ID process, so it doesn’t know what human IDers have difficulty with or whether there were disagreements.

@ wildskyflower describes the three different ways to engage the CV.

The “at upload” part of the question is unclear. Or rather, that’s not how it works when I do it. When I add a new observation (at upload), I have two choices: I can click in the ‘species name’ box and CV offers a suggestion, or I type in a name. It doesn’t offer suggestions after I type in a name. So “at upload” the CV cannot be affected by any identifications because no identifications can be provided at that stage.

It’s only AFTER upload, when I go to the uploaded observation and click the ‘compare’ button and select ‘visually similar’ does the CV start suggesting things within the taxon that I select. If I provide a genus level ID, the CV will only suggest species within that genus.

Perhaps you’re only talking about the third way of engaging the CV. By clicking the ‘Species name’ box under the Species Identification tab of an observation that has already been uploaded?

Using that third approach, with this particular observation, I can’t imagine that it’s not influenced by the identification (the one made at upload or the community ID–I don’t know). Somewhere in this image there’s a bee–it’s difficult to find as this pic is mostly plant and very little bee. But the CV only suggests several insects. Not just sweat bees, or bees, or hymenoptera, but insects (no plants are suggested). Another example here: the CV suggestions are clearly being restricted to the insect taxon despite the fact that this photo is clearly more plant than insect. And the insect suggestions are “wild guesses” on the part of the CV.

My conclusion is that the CV is being restricted by one of the IDs to a taxon as low as Class but probably not lower.

These examples and the other information offered here help me understand CV behavior. And, yes, Russell, I was primarily thinking of subsequent CV suggestions after an observation has one or more OP- or community-supplied IDs. I now understand that functionality. I’m actually doing some research on the CV suggestions for a couple of moths (my beloved Petrophila’s) which will hopefully compare and contrast CV suggestions in various scenarios.

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.