I want to be able to examine more details about the training sets for individual species. Which observations were used? How many? How many RG? What was their geographic distribution? Is any of this possible?
Context: I am frustrated by CV-suggested IDs when it has either been trained only on one common species out of a set of similar species, or there is a failure of the “Expected Nearby” functionality, yielding spurious suggested IDs.
Case in point: Two nearly identical aquatic Crambid moths: Two-banded Petrophila (Petrophila bifascialis), widespread across the eastern U.S. including Texas; and Capps’ Petrophila (Petrophila cappsi), a regional endemic in Texas and Oklahoma. iNat’s CV claims to “know” each of these species, but it comonly prioritizes or suggests Two-banded for observations in Texas and Oklahoma. It is “pretty sure” of the genus but Two-banded shows up disproportionately as the 1st suggested species.
Reality: At present the two species can only be separated with a good view of the hindwings which, in typical pose, 90% of images don’t show. But ALL observations outside of TX-OK are readily identifiable, with or without a view of the HW since Capps’ doesn’t occur there. And unfortunately there are 6X as many Two-banded observations across its range than there are Capps’. The result of this for CV training is that Two-banded Petrophila has become the default ID suggestion routinely for all Texas observations. The problem of a cryptic species in Texas is bypassed. Observers (i.e., in Texas) accept “Two-banded” for an ID, reinforcing the error and compounding the already under-appreciated ID challenge. I follow-up with a genus-level ID placement and a polite note on such observations, but my efforts cannot hold back the tide of CV suggestions.
Based on only my human-oriented ID skills, I can virtually guarantee that, in the absence of a view of the hindwings, CV cannot separate these two species in their overlapping ranges in TX and OK. I’d be thrilled to be proven wrong, but for the time being CV isn’t recognizing it’s failure. What can be done for such regional endemic-cryptic confusions? (A) Impractical: Run separate CV training sets for areas of cryptic species’ sympatry and for areas where only one or the other occur; (B) Impractical: Add a disclaimer or caveat when making an ID suggestion in a zone of overlap of cryptic species; (C) Impractical: Do a sort (by CV?) to separate photos which do and do not show a sufficient view of the HW, then run separate training sessions for these two sets, excluding forewing-only photos in the zone of sympatry; (D) Undesirable: Exclude pairs of cryptic species from CV training altogether.
I really don’t know what the answer to this dilemma is, and that’s not the point of this post. But first-things-first, I’d like to be able to “look under the hood” at specific training sets to see where things may go awry.