Computer vision performance summary data

Well, that would depend on the broader methodology.

But I´m confused.
Here you say “also”… as if to say you agree your stats are not representative…

Then you seem to agree here too that your dataset is not actually representative.

But this seems to muddy the waters. How can you continue to claim your dataset is “representative” if it´s only representative of a non-representative portion of the data?
The dataset you use is random and representative within the context of RG records alone, sure.
But it is non-representative in the context you actually frame it in. Titling the post broadly as “Computer vision performance summary data” and opening with the statement that you are using “a truly randomized/representative set of observations” to talk “about the relative accuracy of the CV” all point the reader to believe your stats are representative of the average iNaturalist observation. If you follow with a less-explicit caveat later on which actually makes this opening framing null and void, this seems pretty misleading to me.

Across a representative section of research grade records, it might do a pretty admirable job. But RG records alone are clearly not representative of a cross-section of the records being submitted on iNaturalist. Use of the term “representative section” is again, misleading here, I think.


Broadly speaking, I don´t disagree though that the CV does an admirable job. I certainly don´t believe that CV “sucks” on European Diptera. It´s a powerful tool for any users new to a taxa. Crucially, I think the CV isn´t even the issue here - it´s the UI that needs addressing, as others here are stating ( and I have said on other threads ). I also don´t believe you are incorrect per se in assuming the majority of observations which use CV are correct. But, I don´t think it matters much either way either, for the reasons @matthew_connors said.

I do think rigour around stats are really important in the context of the CV though, given the way previous stats have been taken out of context. These sort of stats and their connected statements are just too often regurgitated in other threads … masquerading as evidenced rebuttals of valid concerns about the existing system despite lack of rigour/applicability in reality.

Then, you need to include Needs ID obs in your stats for them to be meaningful.
I don’t necessarily have a solution on how you do this. I’m just saying that at present, you’ve actively selected the data where the humans agree with the CV and ignored the ones where they don’t. This is the definition of cherry-picking if this is the point you are trying to make. The stats/conclusions you continue to infer from it, are misleading without further development.

I think, if we are to continue discussing though, this should probably take place on the post itself.

1 Like