In response to the ongoing discussion about the relative accuracy of the CV, I decided to pull together a truly randomized/representative set of observations and review the performance/use etc of the computer vision.
Last Updated - 2021-08-29 (n=150 records)
Notes on methodology
- records are randomly selected, generated by asking Google Sheets to generate a random number and then looking up the associated observation with that ID
- review is restricted to research grade records. I understand and know that the performance is likely lower on needs ID records, but Iām not an expert on every taxa in the world, and I canāt evaluate if the users ID an/or a CV result for that record is correct
- CV suggestion is being run on the website version of the platform with the ādefault to locally observed taxaā enabled. I do not have access to an iOS device to see how the results might vary where that default is not available
- all CV results are what the current training model generates, I have no way of knowing what the training model would have presented at the time the observation was created.
- records which have a community ID at the subspecies (or equivalent) level are considered to be correctly matched by the CV if it suggests the species as subspecies are not in the training model
- Iām also going to try and figure out if I can add html tables in the text and change the views below to tables for readability
Summary of data
Distribution geographically (note percentage is for each nation listed, not sum of those nations)
- 47% US
- 10% CA
- 7% RU
- 6% MX
- 3% ES,IT
- 2% ZA, NZ, FR
- 1% TW, SG, PT, GB, DE, CZ, BR, AU, AT, ZM, TH, SV, PRIVATE, LX, JP, CR, CO, AR
Distribution across iconic taxa
Iconic taxa | Percent of records |
---|---|
Arachnids | 3 |
Birds | 26 |
Crustaceans | 1 |
Fish | 3 |
Fungi | 6 |
Herps | 9 |
Insects | 17 |
Mammals | 0 |
Molluscs | 1 |
Plants | 34 |
Distribution of entry source and original ID source
Entry source | Percent of all records | In source, ID percentage of observer using CV | In source, Id percentage done by human observer |
---|---|---|---|
Android | 21 | 38 | 63 |
iOS | 30 | 64 | 36 |
Seek | 1 | 100 | 0 |
Website | 48 | 58 | 42 |
Percentage of records whose taxa is not in the CV training model - 4%
Percentage of records where the community ID taxa is in the training model and the taxa is not suggested at all by the CV - 2%
Percentage of records where the CV 1st suggestion matches the community ID
Iconic taxa | Percent of records |
---|---|
Arachnids | 100 |
Birds | 85 |
Fish | 3 9 |
Fungi | 79 |
Herps | 89 |
Insects | 96 |
Mammals | 0 |
Molluscs | 100 |
Plants | 82 |
The primary conclusion here is when the taxon is in the training model set, the CV appears to generally do a good job of not only recognizing the taxa, but making it the first suggestion
Percentage of time that when a human does the initial ID, the computer vision agrees with their ID as itās first ID
Iconic taxa | n | 1st CV suggestion matches Community ID | 1st CV suggestion does not match Community ID |
---|---|---|---|
Arachnids | 100 | 0 | |
Birds | 89 | 11 | |
Crustaceans | 100 | 0 | |
Fish | 78 | 22 | |
Fungi | 84 | 16 | |
Herps | 79 | 21 | |
Insects | 89 | 11 | |
Mammals | |||
Molluscs | 100 | 0 | |
Plants | 84 | 16 | |
Percentage of time that when the 1st suggestion of the CV does not match the community ID that the taxon is included further down the list - is included further down - 63% - is not included at all as an option - 37% (includes records where the taxa is not in the training model)
The raw data can be found here for those interested
CV Summary review data