That’s because it may be using arbitrary elements of the background rather than features of the organism itself. If, say, all the photos were of labelled museum specimens, the CV could literally make use of coffee stains on the labels if that gives the statistically best results. Its “vision” really is that blind. It has no more comprehension of the subject of a photo than an OCR program has of the plot of a novel it’s scanning. The credit for the IDs is entirely due the human identifiers who provided the dataset.
I disagree to an extent.
First: It certainly can use background features, but background features can be a perfectly legitimate ID feature. For example, it can learn that plant species X is only found in sand, species Y is only found in sparsely vegetated soil, and species Z is found in loose gravel, and that would be a perfectly legitimate thing for a human IDer to use as well. Just two days ago I found a species where a field guide said that the most reliable way to ID to species was to ID the plant to genus, ID nearby associated species, and see which list of known associated species was a better match.
Second: The CV can learn things about the habit, flower shapes, leaf orientations, etc that are difficult to describe precisely in words with available vocabulary, or do not survive pressing in museum specimens, and consequently do not get described well in keys. Expert IDers often learn these kinds of features through experience and actually use them all the time. I have absolutely found pairs of taxa where the best ID feature to distinguish them is not in the key I learned the taxa from. I also am certainly not using ‘key’ features when I ID species flying by at 55 mph through the passenger window of a car. Because the CV isn’t learning from a key, it can learn the features that real experts actually use, not just the features that are easy to describe in words.
Third: In some cases it can learn real, statistically accurate heuristics that would be very tedious to compute by hand. Hypothetically, it could learn that fish species X has on average 300+/-50 scales, while fish species Y has on average 500+/-50. Human IDers see patterns like this too, but might just describe it as ‘species X is usually not that big’ or something. Because the actual pattern is quantitative and not qualitative, this is the kind of feature you could reasonably expect a computer to be better at learning than a human.
Sure, features like minute statistical differences aren’t good enough for high confidence on their own, but most of the time in hard taxa no single feature will be good enough for a high confidence ID on its own. This is where the key gets down to diversity in the training set. A more diverse training set forces the CV to start learning the difficult features, which is what you want. 10 pictures each in 2 different taxa will never be enough to force the CV to learn difficult rules. With 1000 pictures from 10,000,000 different observers in 100,0000 taxa, the CV is for sure going to have to learn some difficult rules to get to 80-90% accuracy, not just dumb simple rules.
Of course there is no dispute that the credit is to the human IDers who provided the dataset; the CV is just codifying, and perhaps in some cases expanding on, their knowledge.
Please do not forget to credit the observers who provide their photographs (without which there would be nothing to identify) and the curators who manage the taxonomy (without which most identifications would be to tags like “bug” or “bush” or “danger noodle” and thus impossible to export to and merge with GBIF and other archives). Our vision dataset is the product of our whole community working together.
Oh yes 1000%! Can’t have good IDs without good pictures! Can’t get those super narrow range endemics in the dataset unless someone goes there! And, part of the value of the dataset is its sheer size. Out of curiosity, have you ever done any estimate of how many taxa there are for which inat has more observations than all herbaria/museum specimens combined?
3 posts were split to a new topic: iNat species numbers vs museum/herbaria
Claims like this may reveal far more about human psychology than the real capabilities of systems like the CV. Humans have a hair-trigger when it comes to ascribing agency to inanimate objects. The most cursory survey of religious rituals and folklore traditions make this abundantly clear. A modern manifestion of this is what might be called Cute Robot Mythology™ - to which the CV is clearly not immune.
A simple analogy can illustrate this. Imagine the familiar case of someone losing an earring, and then days later hearing a characteristic rattle inside the hose of their vacuum cleaner whilst they tidy their bedroom. Almost instantly, a human will make all the right inferences, and draw the most likely conclusions about what just happened. Now, would it be correct to claim that the vacuum cleaner (VC) has similar capabilities and somehow “knows” that it’s just found the earring? Does it perhaps possess some inscrutable robotic intuitions about earrings that are unknowable to humans? The VC seems very good at finding certain items that the dumb humans keep losing, so surely there must be something in it? At the very least, it seems very natural to thank the vacuum cleaner in some way: perhaps giving it a pat like a faithful old retriever.
The CV is playing exactly the same role here as the VC. A vacuum cleaner is a crude winnowing device. Humans have deliberately designed it to suck up a limited subset of things that are typically found in a specific range of environments. It has no feature detection capabilities whatsoever; nor does it have any capacity to delevop such capabilities - and it doesn’t need them, because that isn’t what it’s designed to do; they simply have no relevance to its human-assigned role. All we want it to do is reduce a large space of possibilities to a much more manageable one that humans can deal with. Thus, once the earring is in the bag, it becomes much easier for us to find.
In another post, it was suggested we should broaden the net when giving credit for CV identifications. But they only really scratched the surface. The evolution of eyes took hundreds of millions of years; the inference engines in our brains took several million years; and the multidimensional storehouse of human culture took tens of thousands of years. All of that biological and cultural inheritance is brought to bear whenever humans contribute to an identification. The notion that the capabilities of computer programs are in any way comparable is pure mythology (and/or marketing hype). If programs like the CV (or ChatGPT et al) occasionally appear to offer convincing simulations, that’s only because they operate within the very limited domains that are allotted to them by humans. Beyond that, it’s all just wish-fulfillment fantasy.
For almost two decades people used to say that chess engines were amazing at beating humans purely through stalwart defense, but lacked human intuition on offense. Some people still say it, but it hasn’t been true since 2017. Now Stockfish is a terrifying chess god that massively outperforms the best grandmasters in all but a tiny and ever-shrinking set of highly artificial positions deliberately concocted to confuse it.
In the same way, you can of course still identify ways in which systems like CV and ChatGPT underperform. And they aren’t unified into a full general intelligence. But to say that the capabilities aren’t comparable in any way and never will be is not a realistic assessment of the present situation as projected into the near future. In a way, it is underestimating the humans working on improving these systems.
Here are links to guides on photographing fungi for id.
https://fundis.org/get-started/photograph
https://www.inaturalist.org/posts/3531-documenting-mushrooms
https://plantpath.ifas.ufl.edu/misc/media/fungi-submission.pdf
Enjoy!
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.