North American Sinea ID and the Sorcerer's Apprentice problem

It should not be taken literally. The only way to get a 100% accurate list of the requirements is to look at the code itself on github.

This categorically does not happen.

The best way i could explain it is it needs 60 or more observations with the community taxon at that level. An observation with only the observers ID lacks a community Taxon.

This is just another example of not so much a lack of transparency on iNaturalists part, but a lack of detailed specific offical information. You have to basically find it yourself.

https://github.com/inaturalist/iNaturalistAPI/blob/main/lib/vision_data_exporter.js

Chunk copied from there.

// Some rules/assumptions:

// Never use taxa/clades which are known to be globally extinct

// Never use taxa below species

// Never use taxa/clades with rank hybrid or genushybrid

// Never use inactive taxa

// Only consider observations whose observation_photos_count is greater than 0

// Never use leaf taxa whose observations_count is less than 50

// Never use observations that have unresolved flags

// Never use observations that fail quality metrics other than wild

// Never use photos that have unresolved flags

// Populating files:

// Test and Val must have observations whose community_taxon_id matches the clade

// Train can also use obs where just taxon_id matches the clade, as lower priority

// One photo per obs in test and val, train can use 5 per obs

// In train, start adding one photo per observation, and fill with additional 4 if there’s room

// If obs photos are used in any set, the obs’ other photos cannot appear in other sets

// Ideally if obs in train, not represented in other sets. Not too bad if obs in val and test

3 Likes

I linked some code from github. But this only covers what observations and taxa are eligible for the CV. Unless you directly ask staff and they respond with an answer. You/we may have to dig in the code for the CV to find that answer.

Aha, the plot thickens! There is indeed a key to nymphs for the three species found in Illinois. My interpretation of the key is that it would require careful examination of specimens with a dissecting microscope, or perhaps very good macro photos of a specimen (alive or dead) in a dish where one can get the proper angles.

Identification of Nymphs of Midwestern Species and Instars of Sinea (Hemiptera: Heteroptera: Reduviidae: Harpactorinae) Open Access
J E McPherson , Rachel A Shurtz , Shannon C Voss Author Notes
Annals of the Entomological Society of America, Volume 99, Issue 5, 1 September 2006, Pages 755–767.

https://doi.org/10.1603/0013-8746(2003)096\[0776:LHALRO\]2.0.CO;2

I don’t think this sort of detail could ever be expected of the cv system.

2 Likes

I’m going to modify that one statement a bit, having studied some very good images of both spinipes and diadema. Adult Sinea diadema has spines on the frontal lobe of the pronotum, spinipes just has tubercles there. So adult diadema are a bit spinier than adult spinipes. Good macro images would show this, but I’m not sure how well it would be detected by the cv. I have images of both species, and I think I can tell spines from tubercles reliably.

Sinea incognita, also widespread in eastern US (esp. Southeast?), is said to have spines on both lobes of the pronotum. Most adults have short wings. It is recently described (2014), so perhaps has not always been on everyone’s radar. A good image showing the spines is at:

https://bugguide.net/node/view/1229782

1 Like

Looking at the section of the blog where they talk about accuracy, they only use the CV’s performance on RG observations for their calculation of ā€œaccuracy.ā€ That is an incredibly flawed metric.

That is one of the classic rookie mistakes for assessing applicability of algorithms to the real world. They’re saying ā€œLOOK HOW GOOD THIS ALGORITHM WORKS ON THIS EASY MODE EXAMPLE DATA SETā€œ (because RG observations tend to be ones that were easier to ID) while ignoring all of the observations that currently aren’t at RG due to issues like the innate identifiability of the observation, not enough IDers to override an incorrect CV ID to get something to the correct species, what have you. That metric also isn’t going to clearly show the issue at the core of OP’s complaint, which is overspecific IDs that are at least correct at a higher level taxon.

Given that not all photos of Sinea are going to be (1) high enough quality to see the tubercles vs spines and (2) from the same camera angles, and given that the CV model isn’t being told to look for specific things, but rather compare/contrast a bunch of photos from each taxon and draw its own numerical conclusions about similarity, no, I do not expect the CV to pick up on that (now or ever, unless we reach the point in the far future where it’s an actual artificial intelligence in the true sense of the term and not just a glorified reverse image search).

1 Like

I spent some time looking through this today and didn’t get much clarity, but I have barely any programming experience. Some things that are confusing me:

  • That comment on line 29 indicates that leaves with fewer than 50 observations should not* be used, but then I can’t find observations_count associated with the number 50 anywhere in the actual code. On line 346 it just checks that there’s more than 0 observations.

  • There is TRAIN_MIN on line 43 set at 50, which based on its usage in line 301 seems to be related to minimum number of photos, rather than observations. 100 is used as a maximum value in two of the constants at the top there, but never as a minimum value anywhere on the page.

  • Most of this page was created in November 2021, with some updates (not affecting the above details) in March 2023. This would suggest the threshold hasn’t changed since 2021 despite the help page wording changing in 2024.

That GitHub link was provided by the staff in a comment here so I assume it’s accurate though? Given my ignorance of programming my assumption would be that I’m missing some stuff here. On the whole I feel like looking at the code without having anyone here involved in writing it just gives us more confusion than help.

*edited

2 Likes

I just checked up on my personal CV problem species (Narceus americanus) and noticed that it is still getting CV-suggested IDs despite it having been removed from the model 5 months ago (and might now be above the threshold to be included again as a result). Those IDs are coming from observations uploaded via Seek and iNat Next. It looks like this issue was already raised a year ago here but another aspect I wasn’t aware of until just now.

2 Likes

Im not a coder either, but it seems there are actually 3 seperate things in play. A train, test and val data set. Each with different functions and variables.

29 says ā€œNever use leaf taxa whose observations_count is less than 50ā€ the opposite of what you said ao probably a mix up

It is quite technical and I dont blame you. I dont fully understand everything either though I find the comments within the code helpful.

It helped some of my understanding of how the CV works from trying to train it Chironomids.

My best understanding is it needs a minimum of 100 total photos not using more than 5 photos per observation for a minimum of 60 observations with the community taxon equal to the current taxon ID. There are other specifics pretty much all of which are said by the code chunk I posted. Like flagged images cant be used.

1 Like

Good catch, yeah I missed a word when typing there; that’s not the part I was confused about.

1 Like

The comments correspond to how the staff explained CV training and testing works.
There is one more rule that did not make the list: Once a clade is included, the parent clade is excluded.
This means that taxa and genera that are not included are completely ignored and won’t be presented as a suggestion or an option during compare.

There are options to improve the model.
The obvious one is validating against sister clades. It is not clear what could be done if that validation fails.
The other option is costly but straightforward: For every clade included, include the parent clade. For this to work, child clades need to be included.
I give an example:
A taxon is selected to be included. Sample taxon and ssp level observations for training and testing.
Add the genus. Sample all taxa from the genus including ssp.
I am sure this will reduce the problem, no idea by how much.

I don’t know anything about insects but I see this same problem with mosses.

3 Likes

Having to constantly kick Agelenopsis potteri CV IDs to genus is driving me insane. Please do something about this, staff.

5 Likes

Has a formal feature request to exclude certain taxon from the CV when they become a significant issue been made? Why can we not flag problem taxa in the CV?

I don’t understand why if the CV is getting a taxon wrong a significant amount of the time, like 80% or more, we can’t do anything about it directly. We instead are expected to fix the observations its trained on. That could be 1000s of observations and often it can actually be a feedback loop. Which is hard to deal with.

If another identifier got things wrong 80% or more of the time, they would likely be suspended, especially the more identifications they have.

If a common name causes significant amounts of misidentifications, it can get removed.

If a taxon image causes many misidentifications, it is commonly changed.

4 Likes

I think this is too simplistic. Another identifier might get certain taxa wrong 80% or more of the time, but if this is offset by other taxa that they get right most of the time, their overall failure rate would be low enough to stay in good standing. Besides, suspension is mostly for deliberately giving inaccurate IDs, which the CV never does – even when it’s wrong, it’s acting in good faith.

1 Like

It is a generalization because going into details is not very relevant to this thread. I am speaking from my experience with curation flags over the past year. I’m also not specifying certain taxa. If an account is substantially identifying incorrectly, an ID of theirs can be flagged and they can be suspended depending on the circumstances. Reasons like blindly using the CV to identify for others observations, constantly identifying without being able to explain why, etc. It depends on the situation.

Intent also doesnt always matter, even if identifying in good faith, if you are still 80% wrong, causing observations to get pushed up. Depending on responses, you could still be suspended if you continue with no change.

Also the community guidelines state. ā€œAdd accurate content and take community feedback into account. Any account that adds content we believe decreases the accuracy of iNaturalist data may be suspended, particularly if that account behaves like a machine, e.g. adds a lot of content very quickly and does not respond to comments and messages.ā€

I can’t link flags here, its also good to note that this isn’t a super common thing people are getting suspended for everyday. But my point stands that the CV does not have accountability and we aren’t able to flag it for any issue. The CV seems to get a pass even when it may suggest to people a species almost always incorrect. No matter how extreme the case ( even 99% incorrect) there is nothing to do but manually try to fix the input data its trained on, or just deal with it. Hypothetically if every taxa it learned was a seperate human account adding IDs of just that taxon. Some absolutely would get suspended over how often they are incorrect.

3 Likes

I feel like the oft-quoted 1970s IBM guideline ā€œA computer can never be held accountable, therefore a computer must never make a management decisionā€ is relevant in this discussion. As @zoology123 is pointing out, there’s no way to hold the CV accountable (or pause it) even when it sometimes gets rolling in a bad feedback loop. We’re told to correct the humans who are trusting the CV en masse, which is a noble idea, but have no way to turn off the misinformation leak at its source.

Though I agree that I haven’t really seen iNat suspend users for making inaccurate IDs / trusting the CV too much. When it’s crossed into trolling behavior, sure, but not someone just using a ā€œtrust the CVā€ as their baseline principle of identifications. The adding inaccurate content clause mostly just seems to be used as a justification for turning off bot submitter accounts.

4 Likes

Without getting too much into detail. As recent as 8 days ago, a user was suspended for this.

I would say on average it happens every other week, but the frequency varies.

It is bothersome that a user can be suspended, while (practically) nothing can be done to stop the CV when it would really help. There are some taxa with 1000s of misidentifications that are still pilling up ever larger.

I find this discussion interesting! Lots of talk about algorithms and adjustments but I think a simple answer could be that there are just taxa that must be opted-out of finer scale CV resolutions on a geographical basis;

Just like taxon swaps and such are requested, what if a (geographical) taxon CV-restriction could be requested? To me, this would seem easier to implement than an ā€˜ideally trained’ CV outcome?;

That is, I just don’t envision anyone generating a robust CV for all of our earthly taxa… it would seem easier to have a one-size-fits-all model as it generally exists now, and for us to Feature Request an exception rule to IDs? I play with arthropods that often need microscopy (or change body forms significantly during life cycles!) yet share a platform with folks who can identify their birds, frogs, and mammals without having even seen them.

Just throwing what I see as our most feasible community request out there. I’d vote for something of the like, though I’m not informed on the ins and outs of CV development and implementation.

3 Likes

How would one distinguish between CV error and human error?

I am not belittling the problem – I agree that for some taxa it makes an unacceptable amount of wrong suggestions, and I, too, am extremely frustrated by its inability to recognize certain genera and its persistent misidentification of some species.

However, it also has to be admitted that a lot of the misidentified observations that I see are not solely the fault of the CV. In a fair percentage of the cases where a wrong suggestions has been chosen, it is not because all of the suggested IDs were wrong – often the more conservative top-level suggestion or some other suggestion would have been correct.

Here’s an example: Colletes hederae is a large, yellow-striped ivy-specialist bee that the CV has a tendency to majorly oversuggest. It does so in two situations. First, for other Colletes species that the CV does not know because the summer Asteraceae specialists are difficult to distinguish. Second, for other ivy visitors with a broadly similar size and color scheme (Apis mellifera, Eristalis tenax, Vespula germanica, etc.). The thing is – normally its suggestions should also include a genus-level Colletes, or perhaps a general ā€œbeesā€ in the first case (conservative top-level suggestion) and the other species (all of which are in the CV) in the second case. In other words, users are not selecting the wrong suggestion because the CV presents them with no other option, but because they are actively choosing that option. Because they think it looks right, or because the idea that the insect they see visiting ivy must be an ā€œivy beeā€ is too irresistible, or because they have no idea and are therefore randomly picking one of the suggestions, or some other reason that is not clear to me.

Another example: There are three black carpenter bees (Xylocopa) in Europe which look very similar. They can be reliably distinguished from photos in a certain percentage of cases, maybe around half the time. (Males of the most common species have distinctive orange bands on their antennae; males of the other two species can often be recognized based on more subtle traits; females are difficult.) All three species are in the CV. Not infrequently, the first species suggestion is even the correct one (it has at least a 1 in 3 chance of being right, after all). And yet users often select the wrong suggestion – even in cases when a tiny bit of knowledge or research would suggest that it is not plausible (i.e., they choose a species not expected for the region, or don’t recognize the significance of the antennae). Sometimes these implausibly wrong IDs get confirmed by other users. These are big, conspicuous bees and fairly popular, meaning that they are not obscure organisms where it would be extremely difficult to find even basic information because all the material is hidden in some journal somewhere. And yet wrong IDs persist.

On occasions when I have asked users about their IDs or they have volunteered comments, they typically mention perceived traits that are not diagnostic (the amount of bluish sheen to the wings, which is a product of the lighting; the amount of fuzziness, which is a product of how fresh or worn the individual is; etc.).

So clearly something else is going on, for which the CV cannot be held entirely responsible.

I do not have a sense at this point whether the changes made to the CV implementation in the new app are exacerbating some of these problems (i.e., skipping the conservative top suggestion, the difficulty of figuring out how to enter an ID other than the CV suggestion). I imagine it is not helping, but the problems I am seeing predate it.

5 Likes

I think one approach to this issue would be to remove individual Sinea species from the computer vision (cv) model. That way the system will not suggest those species. I am unclear on both the mechanics of how this might work and how one could request that change to the model be made. I see the problem noted somewhere above where once a daughter taxon (say species-level) is included, the parent (say genus) is no longer offered as a top suggestion. Perhaps I am misunderstanding that.

I know there is resistance to this sort of idea, but the Sorcerer’s Apprentice problem keeps coming up. As soon as people stamp out the unjustified ID’s, one poor ID will start the model making unjustified suggestions again and there is a rapid feedback loop–the taxon overflows with unjustified observations. The interaction of the users and the model needs stronger guardrails.

I think in the case of North American Sinea, it would be best to remove all species from the cv model–make suggestions only to genus level. If it has to done for all Sinea species, even if some can be recognized by the cv, it seems the upsides outweigh the downsides. So would this be a curation request?

Underlying this whole issue is what I like to think of as the ā€œhabitus fallacyā€:

  • Biological speciation does not necessarily lead to a difference in visual appearance (habit for plants, called habitus in zoology).

I think many amateur naturalists, myself included, started with birdwatching. Birds are visually-oriented, so most reproductive isolation, and speciation, is associated with a difference in overall appearance (habitus). This is also true for many diurnal species of insects, such as butterflies. It is less true for the largely nocturnal Lepidoptera, ā€œmothsā€.

Again, thinking of invertebrates, in many groups, species differences are infrequently associated with differences in habitus. The ā€œJune beetlesā€ (Phyllophaga) are a good example. There may be a few species with distinctive habitus, but the majority can be identified to species only by dissection of a male and examination of the aedeagus and associated structures. This is often the basis of the species descriptions as well.

Other complications for ID by habitus are the presence of poorly-documented polymorphisms within a species and mimicry complexes that involve many species, or even multiple families. For many invertebrates, there are taxonomic issues, with the original species descriptions being inadequate, type specimens lost, and experts in the field no longer active. In many of these cases, application of a cv model to habitus images will never be reliable unless the underlying taxonomic issues are understood.

That said, I want to emphasize that I am super-enthusiastic about the whole Citizen Science model of iNaturalist. The cv system is a very useful tool, but it needs better guardrails in many cases.

5 Likes

Not quite. There is a difference between what taxa are used for training the CV and what taxa it is able to suggest.

The model is only trained on leaf taxa – that is, once the CV has been trained on any child taxon, image material for the parent of that taxon will no longer be used in training. This causes problems in cases where (for example) only once species is in the CV and this species is not a typical member of the genus (say, because the other species are very difficult to ID from photos). What this means is that the CV may then no longer recognize the typical members of the genus.

However, when making suggestions, The CV can always suggest broader taxa for the top suggestion, even if it has not been trained on them. I am not clear exactly how it determines this or which taxonomic ranks are eligible for being suggested, but it appears to assess the best leaf taxa matches and then use information about taxonomic relationships to determine a common ancestor.

2 Likes