AI impact on: Taxonomy?

Even though I feel vastly underqualified to bring this up here, I have recently begun to think a little wider and wonder how AI power will effect the world of taxonomy.

At the base of taxonomy is observational data. That includes genomic analysis, sure, but the reality is the ‘dream’ of a complete genome of the Tree seems too remote to contemplate, and especially because of all the disputes and twists that I have read about in the way evolution ‘mostly works’ and often, ‘does something different’ in ways to truly challenge the taxonomy field.

I suppose, like anyone, I would presume that beyond visual recognition, there will be other patterns that may just be too tediously impractical or impossibly expensive using conventional study. Will the brute comparative and pattern recognition power of AI analysis solve some long-lasting puzzles there? Is there any low-hanging fruit problems that might be truly solved with this tech?

And I’m sure that AI will provide many, many other promises – and threats – to the future of this field. Some we may have no way of predicting yet!

Have you thought about this or could you share some reading or such that might help make us a little more aware of the issues?

Something more Homer-Simpsonish in the level of science would be truly appreciated, of course.


I could see AI being used to look at available genomics data across the whole tree of life and making decisions about where we draw the lines between taxonomic categories to somehow “standardize” Linnaean categories such as families, genera, species, etc. Not that I’m advocating for that, but it’s the kind of comprehensive data crunching that likely will be doable.


If that happened, it would result in some taxa being lumped and others being split. Hopefully, it could achieve some balance between the opposing camps. It has been said that “there is no agreed-upon number of base pairs to define a species-level difference” – well, what if there was?

1 Like

Some species are quite ecologically and morphologically distinct despite being very genetically similar (Blue-winged and Golden-winged Warblers are one example I can think of), and other species are quite genetically distant but seem identical in every way to us. I don’t think separating species by raw genetic difference is something that any field biologist would find appealing haha.


Wouldn’t base pairs differ markedly in the degree to which they determine phenotypes, and in the manner in which they interact with each other in doing so? Accordingly, a set of more complex formulae or algorithms than that simple number of base pairs would seem appropriate, which AI perhaps could help to devise.

I don’t really know if that would result in a useable taxonomy. The differences are too large between for instance bacteria and any “higher” eukaryotic life form.

I recall a professor of mine saying that there can be a larger genetic difference between some bacteria of the same species than between humans and bananas. (I don’t know how accurate this is, but I can imagine that there is some truth to that even if it is exaggerated)


So, even if the bacteria, banana, and human story involves some exaggeration, it might contain some degree of truth, as noted. Given a large amount of taxonomic and genomic data across a wide spectrum of forms of life, a model that assigns different weights to counts of base pair differences among different broad taxa could be devised and evaluated. For example, assign a considerably smaller weight to base pair difference counts among bacteria than among Mammalia and among Musaceae. AI might be able to help with this task, which would entail using huge amounts of data. It might not work out, but it would be an interesting experiment.

1 Like

at its utmost extreme it just becomes impossible to identify the species without genetics (even if things are not heavily split) and we just end up having a parallel taxonomy for use by anyone who doesn’t have constant access to a genetic sequencer. Cryptic species exist that are genetically distinct, but in terms of ecosystem management and nature study by anyone other than geneticists, it’s just not even accessible to us unless we get to the point of star trek style ‘trichorders’ that scan genes of any organism we get near.


It would be an interesting experiment, definitely, and it would probably yield some useful insights. Though I don’t think it would solve the lumpers vs. splitters debate.

Either we assign different weights to different phyla, as you suggested, which would probably lead to the same effect we see now (we can of course keep the weight consistent within each kingdom or something, but we could in theory just do that now without AI as well), or we don’t which would probably lead to nonsensical taxonomy in some branches.

1 Like

I agree with this, but I also just want to say that I’d fully support the development and wide spread use of Star Trek style trichorders. :D


If trichorders become widely available amidst a post-scarcity Federation style economy, i may be willing to accept more taxonomic splitting. But then there’s the matter of how life appears abundant in that universe and how do you even approach distinct genesis or broad panspermia in the Linnaean system? What about some organism on the Vulcan planet that looks exactly like a cactus but evolved from a separate genesis? Though i think alien-lead panspermia is part of Star Trek canon. I guess i’m getting way off topic in nerd land here. I do have my own ideas about how evolutionary convergence is probably a stronger force than people realize and if we ever find life on another earth-like planet it will be remarkably similar to that on Earth. I guess the real question is whether or not there is an equivalent of Taraxacum microspecies. Good thing we’ll have trichorders by then.


Trichologists may eventually have trichorders to diagnose your scalp problems. I’ll wait for the tricorders to be invented to do everything else.

Sorry, I was really nerding out there. ;-)


Thanks for noting that as well! I hadn’t quite pulled the trigger, but I’ll bite on panspermia, which is (sort of) canon. The ST:TNG episode “The Chase” addresses why there are so many humanoid species that are similar in the galaxy (besides the fact that it is much, much easier for a makeup department to deal with…)

More broadly though, there probably is a place for AI when it comes to taxonomy. AI’s strengths include pattern recognition and crunching vast amounts of data so quickly that it would be impossible for humans. We’re going to need this, given the exponential increase in the amount of genetic data out there (even without tricorders). I suppose the promise of AI is that it can create consistent ways to classify/identify units given large quantities of data (if the training data/process isn’t biased). But even with an objectively “good” (useful) AI output, as with any finding/conclusion, I expect that there will be a lot of people that object or don’t want to use it.


Problem is, what happens when there is a lumper AI and a splitter AI, analyzing the same set of data but with different philosophies about classification? Why should humans have all the fun disagreeing?


That has been said because quantitative differences in shared base pairs has very little correlation with whether the organisms in question belong to reproductively independent lineages, which is the biological reality that most systematists and taxonomist are hypothesizing when they name things at species rank.


@jnstuart Oops. Haha
Don’t you want your scalp examined Star Trek style? :O

@charlie I think if there has been panspermia, it’s quite easy (logistically) to adapt into current taxonomy as we’d basically just have to add some amount of things below LUCA, I think. Then we’d have essentially separate trees of life for each planet, all connected at the base. (Though there would have to be extensive research done on the relatedness between life of different planets, of course).

If biogenesis happened separately (which I think is more likely), we’d probably just have separate trees with a different LUCA for each. We’d have a forest of life and not a tree of life. :D
Unless taxonomists decide to go really crazy and add non-living chemical processes to the base of the trees somehow, which I doubt would ever happen (as it would probably not make a lot of sense, if it even is possible at all (plus, it could become a lot more difficult if we ever find non-carbon-based life)).


I read this ‘layman’s’ description of the taxonomy conundrum today, and it might interest some others if they want a short (admittedly ‘popular’) review:


I’m a librarian. I remember card catalogues.

But library catalogues have been virtual / digital … for decades! So much more efficient than the card is lost / misfiled so effectively lost / tiny typo so not where you expect it … and the taxonomy issue - hasn’t been catalogued (described) yet.

What taxonomy does is we try to build the card catalog of that library

And our toktokkie - Mariazofia named for his two daughters. While giving the old name a new meaning.

Global North are talking about biodiversity in the Global South

Digital may be more efficient now, but I remember my frustration when the transition first happened – back when processors were so slow that I could have flipped through dozens of cards by hand in the time it took for a page of search returns to come up.


I worked in Zurich when they combined two huge local universities in a digital catalogue. Someone PR, obvs not the IT department, decided to go live the day the students flooded back for a new year.
so slow
my colleague read the newspaper while we waited. Gave up in despair. And that is when tiny typos hit hard!!

1 Like