the scores come from the computer vision API. for any given observation, the scores of all the suggestions should add up to 100. so the closer any single score is to 100, the stronger that score is.
the easiest way to see that information is to:
open up your browser’s developer tools
go to the network monitor
open up your observation
trigger the computer vision suggestion on the observation
look in the network monitor for the request that was issued to the computer vision. for a request triggered rom the observation page, the request should end with the same number as the observation number (as shown in the image below). for a request triggered from the web upload screen, the request will show as score_image (as shown in the screenshot in my first post on this thread). when you look at the response for that request, you’ll see the data that the API returns, including scores. (i’ve been focusing on the combined scores, which the raw visual match scores adjusted by the geomodel scores.) the species suggestions are ordered in the same order as you see them displayed in the list of suggestions that you see on the observation page or upload page.
here’s a screenshot with some markings that might help visualize that process above:
i think surfacing potential distinguishing features might be one of the most useful applications of computer vision / AI. the best chess masters in the world these days are learning new lines of play from AI. and AI is helping doctors to screen for disease. so why can’t we learn from the computer vision for identifying taxa, too (assuming we can figure out what it’s looking at and verify it’s reliable)?
This is also what interests me the most about the long term prospects for the inat CV/future models trained on the same data. It is why I think it is so important that the specific images used to train the CV are random/blind and include ‘bad’ photos that show no known key features. Harder is better for the training set if we want it to learn new features. I think the CV also gets better the more options it is forced to choose from, because it forces it to learn about harder and more diverse features.
As silly as it might seem on the surface to “do a deep dive” into two obscure moths, pisum has brought up why it may not be so silly after all: the broader applicability of this question. The reality is that large numbers of iNat users are going to use the CV whether the “experts” like it or not. If we can understand its workings well enough to address its shortcomings, that can only be a good thing.
We complain about the backlog of Unknowns, then we complain about CV-suggested IDs. Given how the platform works, and who its user base is, these two things are always going to be a trade-off; that is, a decrease in one will be associated with an increase in the other.
interesting. i wish they would have offered a way to overlay particular observations over the geomodel areas to visualize stuff better, but i guess that’s where third party stuff like this can fill that need.
maybe i’m not looking hard enough, but although they allow you to filter, it doesn’t look like they reveal the geomodel score of a particular observation anywhere (not even in the API response). i wonder why not?
My wife and I just had a nice video chat with our daughter (who works in the tech industry) regarding data sets and training of AI/CV. I think she’s reading a book about “Deep Learning”–way over my head, but this gets back to my original question about being able to at least peak under the hood of CV to understand more about its sample selection, sample sizes, etc., etc., for a given taxon.
When has anyone suggested that we should not do this?
People who complain about the uncritical use of the CV are people who ID taxa where the CV performs abysmally. We do not have a problem with the CV in principle, we have a problem with the fact that at present it results in many many egregiously wrong IDs for these taxa and creates a lot of extra work for human IDers. There have been numerous discussions that include efforts to try to figure out why it is getting it so wrong and how the training could be changed to reduce these problems.
Here’s a teaser from the data crunching I’m doing: I’m examining IDs and CV outcomes for the two moth species in 32 counties where their ranges overlap in Texas (more details on the research methodology to be published later). In those counties, there are close to 1,000 RG observations of the two species, typically images with a sufficient view of the HW to see diagnostic features. The proportions are Two-banded (54%) and Capps’ (46%), and that varies somewhat geographically within the overlapping ranges. However, from 3 independent tests of the aforementioned 20 “hard” cases (without a view of the HW), CV is suggesting Two-banded 80% of the time and Capps’ only 20%. So as I mentioned above, either CV is over-confident, poorly trained, or it knows something I don’t…and all three of these may be true!
I’m now examining a larger sample of the hard cases from within the 32 county region.
for what it’s worth… with plants, the CV is often able to correctly identify vegetative specimens with no reproductive material in taxa where that isn’t possible via any of the keys. Sometimes it’s possible via gestault, but other times i can’t tell but the CV is correct. It’s far from perfect, and people do use it when they shouldn’t, but also sometimes it gets it right when i wouldn’t expect it could. And it’s getting better all the time. of course it needs correctly identified RG observations to build from or it won’t continue to improve.
instead of looking at counties where ranges overlap, i just looked at counties that have both species recorded in the system.
in the United States, i see 18 counties where there are both P. cappsi and P. bifascialis observations. among counties that have both species, i get 370 observations of P. cappsi and 341 observations of P. bifascialis.
when you break this down by county, you see that many counties skew towards one species or the other. so if you weight the genus-level Petrophila observations in these counties by the species-level counts in each county, then across these counties, i would expect the CV to suggest roughly 42% P. cappsi and 58% P. bifascialis as the first species suggestion in these cases.
State
County
Place ID
A = P. cappsi
B = % A of E
C = P. bifascialis
D = % C of E
E = A+C
F = at genus
G = % of total F
H = F weighted by B
I = F weighted by D
J = A+C+F
Oklahoma
Johnston
3058
1
50.00
1
50.00
2
0.00
0.00
0.00
2
Oklahoma
Oklahoma
355
1
50.00
1
50.00
2
0.00
0.00
0.00
2
Texas
Bastrop
441
1
20.00
4
80.00
5
7
3.40
0.68
2.72
12
Texas
Bell
1878
4
36.36
7
63.64
11
2
0.97
0.35
0.62
13
Texas
Blanco
1767
6
66.67
3
33.33
9
2
0.97
0.65
0.32
11
Texas
Bosque
1707
2
11.11
16
88.89
18
12
5.83
0.65
5.18
30
Texas
Dallas
1281
2
3.08
63
96.92
65
17
8.25
0.25
8.00
82
Texas
Ellis
1290
3
10.71
25
89.29
28
3
1.46
0.16
1.30
31
Texas
Hamilton
272
3
50.00
3
50.00
6
2
0.97
0.49
0.49
8
Texas
Hays
326
29
69.05
13
30.95
42
7
3.40
2.35
1.05
49
Texas
Kerr
1710
4
80.00
1
20.00
5
5
2.43
1.94
0.49
10
Texas
San Saba
383
1
25.00
3
75.00
4
1
0.49
0.12
0.36
5
Texas
Somervell
1215
1
10.00
9
90.00
10
8
3.88
0.39
3.50
18
Texas
Terrell
895
6
75.00
2
25.00
8
1
0.49
0.36
0.12
9
Texas
Travis
431
23
20.54
89
79.46
112
62
30.10
6.18
23.92
174
Texas
Uvalde
3029
1
33.33
2
66.67
3
2
0.97
0.32
0.65
5
Texas
Val Verde
405
87
98.86
1
1.14
88
20
9.71
9.60
0.11
108
Texas
Williamson
2442
195
66.55
98
33.45
293
55
26.70
17.77
8.93
348
Total
370
341
711
206
100.00
42.26
57.74
917
however…
most counties have more P. bifascialis observations. so i would expect that if you looked at CV suggestions for genus-level observations in P. bifascialis heavy counties, you would get more CV suggestion for P. bifascialis, and vice versa.
this seems to be generally true, except notably in Williamson County, which also has the largest number of Petrophila observations. it looks like there’s one prolific Petrophila observer there whose observations skew towards P. cappsi.
so if i exclude that person’s Willamson County Petrophila observations, then the analysis changes a bit. now in these counties, i have 189 observations of P. cappsi vs 285 P. bifascialis, and for the genus-level observations, i would expect the CV to suggest roughly 33% P. cappsi and 67% P. bifascialis as the first species suggestion.