Comparing CV Outcomes in a Pair of near-Cryptic Moth Species

the scores come from the computer vision API. for any given observation, the scores of all the suggestions should add up to 100. so the closer any single score is to 100, the stronger that score is.

the easiest way to see that information is to:

  1. open up your browser’s developer tools
  2. go to the network monitor
  3. open up your observation
  4. trigger the computer vision suggestion on the observation
  5. look in the network monitor for the request that was issued to the computer vision. for a request triggered rom the observation page, the request should end with the same number as the observation number (as shown in the image below). for a request triggered from the web upload screen, the request will show as score_image (as shown in the screenshot in my first post on this thread). when you look at the response for that request, you’ll see the data that the API returns, including scores. (i’ve been focusing on the combined scores, which the raw visual match scores adjusted by the geomodel scores.) the species suggestions are ordered in the same order as you see them displayed in the list of suggestions that you see on the observation page or upload page.

here’s a screenshot with some markings that might help visualize that process above:

i think surfacing potential distinguishing features might be one of the most useful applications of computer vision / AI. the best chess masters in the world these days are learning new lines of play from AI. and AI is helping doctors to screen for disease. so why can’t we learn from the computer vision for identifying taxa, too (assuming we can figure out what it’s looking at and verify it’s reliable)?

oh… and this isn’t anything anyone asked for, but i updated my iNat mapping page so that you can use it to see particular observations overlaid over the Expected Nearby geomodel area for a particular taxon. this allows you to quickly see if there are any observations that fall outside the geomodel area. for example, here are the 20 subject observations overlaid over the Expected Nearby area for P. cappsi: https://jumear.github.io/stirfry/iNat_map?showexpectednearby=true&taxon_id=930443,178410&id=244843712,244817491,243511245,243463506,243252933,240474025,235274021,233397803,231662243,232072036,231442128,230956673,230926818,230600952,220830053,214202191,141121717,184955043,56191946,49689527

3 Likes

This is also what interests me the most about the long term prospects for the inat CV/future models trained on the same data. It is why I think it is so important that the specific images used to train the CV are random/blind and include ‘bad’ photos that show no known key features. Harder is better for the training set if we want it to learn new features. I think the CV also gets better the more options it is forced to choose from, because it forces it to learn about harder and more diverse features.

2 Likes

Because that goes against dogma?

As silly as it might seem on the surface to “do a deep dive” into two obscure moths, pisum has brought up why it may not be so silly after all: the broader applicability of this question. The reality is that large numbers of iNat users are going to use the CV whether the “experts” like it or not. If we can understand its workings well enough to address its shortcomings, that can only be a good thing.

We complain about the backlog of Unknowns, then we complain about CV-suggested IDs. Given how the platform works, and who its user base is, these two things are always going to be a trade-off; that is, a decrease in one will be associated with an increase in the other.

2 Likes

fortune teller pisum?

https://www.inaturalist.org/blog/99727-using-the-geomodel-to-highlight-unusual-observations

3 Likes

interesting. i wish they would have offered a way to overlay particular observations over the geomodel areas to visualize stuff better, but i guess that’s where third party stuff like this can fill that need.

maybe i’m not looking hard enough, but although they allow you to filter, it doesn’t look like they reveal the geomodel score of a particular observation anywhere (not even in the API response). i wonder why not?

1 Like

My wife and I just had a nice video chat with our daughter (who works in the tech industry) regarding data sets and training of AI/CV. I think she’s reading a book about “Deep Learning”–way over my head, but this gets back to my original question about being able to at least peak under the hood of CV to understand more about its sample selection, sample sizes, etc., etc., for a given taxon.

1 Like

When has anyone suggested that we should not do this?

People who complain about the uncritical use of the CV are people who ID taxa where the CV performs abysmally. We do not have a problem with the CV in principle, we have a problem with the fact that at present it results in many many egregiously wrong IDs for these taxa and creates a lot of extra work for human IDers. There have been numerous discussions that include efforts to try to figure out why it is getting it so wrong and how the training could be changed to reduce these problems.

1 Like

Here’s a teaser from the data crunching I’m doing: I’m examining IDs and CV outcomes for the two moth species in 32 counties where their ranges overlap in Texas (more details on the research methodology to be published later). In those counties, there are close to 1,000 RG observations of the two species, typically images with a sufficient view of the HW to see diagnostic features. The proportions are Two-banded (54%) and Capps’ (46%), and that varies somewhat geographically within the overlapping ranges. However, from 3 independent tests of the aforementioned 20 “hard” cases (without a view of the HW), CV is suggesting Two-banded 80% of the time and Capps’ only 20%. So as I mentioned above, either CV is over-confident, poorly trained, or it knows something I don’t…and all three of these may be true!
I’m now examining a larger sample of the hard cases from within the 32 county region.

3 Likes

for what it’s worth… with plants, the CV is often able to correctly identify vegetative specimens with no reproductive material in taxa where that isn’t possible via any of the keys. Sometimes it’s possible via gestault, but other times i can’t tell but the CV is correct. It’s far from perfect, and people do use it when they shouldn’t, but also sometimes it gets it right when i wouldn’t expect it could. And it’s getting better all the time. of course it needs correctly identified RG observations to build from or it won’t continue to improve.

3 Likes

instead of looking at counties where ranges overlap, i just looked at counties that have both species recorded in the system.

in the United States, i see 18 counties where there are both P. cappsi and P. bifascialis observations. among counties that have both species, i get 370 observations of P. cappsi and 341 observations of P. bifascialis.

when you break this down by county, you see that many counties skew towards one species or the other. so if you weight the genus-level Petrophila observations in these counties by the species-level counts in each county, then across these counties, i would expect the CV to suggest roughly 42% P. cappsi and 58% P. bifascialis as the first species suggestion in these cases.

State County Place ID A = P. cappsi B = % A of E C = P. bifascialis D = % C of E E = A+C F = at genus G = % of total F H = F weighted by B I = F weighted by D J = A+C+F
Oklahoma Johnston 3058 1 50.00 1 50.00 2 0.00 0.00 0.00 2
Oklahoma Oklahoma 355 1 50.00 1 50.00 2 0.00 0.00 0.00 2
Texas Bastrop 441 1 20.00 4 80.00 5 7 3.40 0.68 2.72 12
Texas Bell 1878 4 36.36 7 63.64 11 2 0.97 0.35 0.62 13
Texas Blanco 1767 6 66.67 3 33.33 9 2 0.97 0.65 0.32 11
Texas Bosque 1707 2 11.11 16 88.89 18 12 5.83 0.65 5.18 30
Texas Dallas 1281 2 3.08 63 96.92 65 17 8.25 0.25 8.00 82
Texas Ellis 1290 3 10.71 25 89.29 28 3 1.46 0.16 1.30 31
Texas Hamilton 272 3 50.00 3 50.00 6 2 0.97 0.49 0.49 8
Texas Hays 326 29 69.05 13 30.95 42 7 3.40 2.35 1.05 49
Texas Kerr 1710 4 80.00 1 20.00 5 5 2.43 1.94 0.49 10
Texas San Saba 383 1 25.00 3 75.00 4 1 0.49 0.12 0.36 5
Texas Somervell 1215 1 10.00 9 90.00 10 8 3.88 0.39 3.50 18
Texas Terrell 895 6 75.00 2 25.00 8 1 0.49 0.36 0.12 9
Texas Travis 431 23 20.54 89 79.46 112 62 30.10 6.18 23.92 174
Texas Uvalde 3029 1 33.33 2 66.67 3 2 0.97 0.32 0.65 5
Texas Val Verde 405 87 98.86 1 1.14 88 20 9.71 9.60 0.11 108
Texas Williamson 2442 195 66.55 98 33.45 293 55 26.70 17.77 8.93 348
Total 370 341 711 206 100.00 42.26 57.74 917

however…

most counties have more P. bifascialis observations. so i would expect that if you looked at CV suggestions for genus-level observations in P. bifascialis heavy counties, you would get more CV suggestion for P. bifascialis, and vice versa.

this seems to be generally true, except notably in Williamson County, which also has the largest number of Petrophila observations. it looks like there’s one prolific Petrophila observer there whose observations skew towards P. cappsi.

so if i exclude that person’s Willamson County Petrophila observations, then the analysis changes a bit. now in these counties, i have 189 observations of P. cappsi vs 285 P. bifascialis, and for the genus-level observations, i would expect the CV to suggest roughly 33% P. cappsi and 67% P. bifascialis as the first species suggestion.

State County Place ID A = P. cappsi B = % A of E C = P. bifascialis D = % C of E E = A+C F = at genus G = % of total F H = F weighted by B I = F weighted by D J = A+C+F
Oklahoma Johnston 3058 1 50.00 1 50.00 2 0.00 0.00 0.00 2
Oklahoma Oklahoma 355 1 50.00 1 50.00 2 0.00 0.00 0.00 2
Texas Bastrop 441 1 20.00 4 80.00 5 7 4.24 0.85 3.39 12
Texas Bell 1878 4 36.36 7 63.64 11 2 1.21 0.44 0.77 13
Texas Blanco 1767 6 66.67 3 33.33 9 2 1.21 0.81 0.40 11
Texas Bosque 1707 2 11.11 16 88.89 18 12 7.27 0.81 6.46 30
Texas Dallas 1281 2 3.08 63 96.92 65 17 10.30 0.32 9.99 82
Texas Ellis 1290 3 10.71 25 89.29 28 3 1.82 0.19 1.62 31
Texas Hamilton 272 3 50.00 3 50.00 6 2 1.21 0.61 0.61 8
Texas Hays 326 29 69.05 13 30.95 42 7 4.24 2.93 1.31 49
Texas Kerr 1710 4 80.00 1 20.00 5 5 3.03 2.42 0.61 10
Texas San Saba 383 1 25.00 3 75.00 4 1 0.61 0.15 0.45 5
Texas Somervell 1215 1 10.00 9 90.00 10 8 4.85 0.48 4.36 18
Texas Terrell 895 6 75.00 2 25.00 8 1 0.61 0.45 0.15 9
Texas Travis 431 23 20.54 89 79.46 112 62 37.58 7.72 29.86 174
Texas Uvalde 3029 1 33.33 2 66.67 3 2 1.21 0.40 0.81 5
Texas Val Verde 405 87 98.86 1 1.14 88 20 12.12 11.98 0.14 108
Texas Williamson 2442 14 25.00 42 75.00 56 14 8.48 2.12 6.36 70
Total 189 285 474 165 100.00 32.69 67.31 639