I am wondering if computer vision (CV) can be used for research purpose?
I am (very naively) thinking of two practical uses but I’m sure there would be plenty of others:
using CV to make phylogenetic trees, could we use the CV to compare taxa and use this data for phylogenetic analysis?
using CV to discover morphological characteristics among cryptic species: I’m thinking of species whose identification rely on e.g. genitals examination. I guess that mean: can we find out what features the algorithm is using to identify species?
I know the CV is far from perfect but I’m sure it will keep on improving.
So what do you guys think? Is it even doable at all?
" * using CV to make phylogenetic trees, could we use the CV to compare taxa and use this data for phylogenetic analysis?"
No, because the CV is not working with phylogenetic data. Any correspondence with phylogenetic relationships holds only until they are manifested in phenotypic traits. Convergent evolution easily tricks the CV, which is easy to see if you start ID-ing plants with reduced organs.
“* using CV to discover morphological characteristics among cryptic species: I’m thinking of species whose identification rely on e.g. genitals examination. I guess that mean: can we find out what features the algorithm is using to identify species?”
In my opinion, in some cases, it would be able to add significant information; in other cases, not. But I like the idea.
However, what would be excellent is to use it for easy measurement of functional traits. For example, plant traits describing leaf shape, maybe size, flower structure and colour, growth form or branching could be possible to estimate using some AI algorithm. That would be a great invention because traits are very widely used nowadays in ecology and the availability of measurements is a bottleneck of research.
I don’t see much application for CV in making phylogenies. However in combination with a known phylogeny there is certainly potential to use computer vision and AI to help generate informative characters to distinguish challenging taxa.
iNat’s CV and opportunistically collected photo dataset might not be suited for this, but folks are definitely trying to put this approach into action. I don’t remember who gave the talk but I recently saw a presentation at a conference about using computer vision to extract leaf morphometric (shape and size) data from digitized and imaged herbarium specimens.
As for phylogeny reconstruction, it seems to me that neither machine learning nor any computational methods can solve the main issue involved. The point is that we can observe only the existing set of features of taxa. Reconstruction of the order of their emergence can only be done speculatively. We cannot check this order experimentally (by observing evolution). The applicability to the construction of phylogenetic trees of any statistical methods (e.g., Bayesian) also remains only a hypothesis. Therefore, in my opinion, at the current level of understanding evolution, any reconstruction of phylogeny is only one of several possible assumptions. There is no possibility to verify the truth of which (at least, now).
Theoretically, computer vision can (see below) increase the number of features available to us. But this does not bring us any closer to the solution of the problem described above.
But as for the use of computer vision in taxonomy, it is certainly an important and promising trend. Already now there is a possibility to use such methods for at least two purposes:
Yes, exactly for recognition of closely related species and extraction of features (parts of images) that can be used to distinguish them. At least, having the highest “weight” for the trained network. Of course, the results of using this algorithm require additional analysis. But the result is impressive - the accuracy of machine identification only by appearance, without features requiring dissection, may be higher than by a human expert. See, for example, https://github.com/AlexKnyshov/TML
There are references to the articles in the repository descriptions.
In my opinion, soon these approaches may significantly change the methods of taxonomy of insects (and not only). Although, I am not an expert in computer vision or machine learning. And I have no experience yet in using these tools (although I should obviously pay more attention to them). Perhaps more experienced colleagues can correct or add something.
I don’t think that there is an answer for your second point yet, but my take is that it’s very unlikely that it is possible to pull out decision-making logic out of the computer vision process. The thing to get your head around is that there is no “understanding” there when CV looks at an image, the way there is when we look at something. It only sees a series of coloured pixels. Decision-making logic will be fuzzy, and as such, difficult to explain. It may not even just be the subject of the photo that is used in decision-making.
Imagine you have 100 images of a bird, all taken on grass, and 100 images of a similar (but easily distinguishable) bird all taken in trees. You feed in a new image of the first bird, but it happens to be in a tree. What will likely happen is that it will be misidentified as the second bird, because the background (grass vs tree) is the most significant difference between the two sets of photos, and not the features of the bird. This shows the CV algorithm isn’t seeing the bird as a subject, but as a just part of the image, and one that isn’t especially significant.
Of course the other thing that this example shows is that it is possible to sometimes guess at what decision-making criteria are being used (ie, in this particular example above, I would surmise that the decision-making logic is primarily based on the background grass vs. tree, but this is by no means possible in every situation. ie, How do I know two particular fish species are being distinguished based on colour/pattern, body shape or something that is difficult for even humans to identify, like scale count (or probably a blend of all of these)? The best you could probably do would be to return a heat map that identifies the areas of the photo that give the most weight to its decision.
This is why “algorithm transparency” is going to be such a huge problem for society going forward (ie, with things like YouTube recommendations). The way these systems is designed is that they have the ability to see things in a way that we don’t, and we haven’t yet invented a good way for the system to describe what it’s doing, with the underlying problem being that for the system to be able to tell us what it’s doing, we need to take away a portion of its “independence” and dumb it down (kind of like the old analogy (paradox/myth?) that the human brain can never be complex enough to understand itself).
*Edit - wow, I just saw how long that reply was… Sorry. :)