Biodiversity, Inaturalist, and the Arthropod Tree of Life

Every now and then, Inaturalist releases summary statistics as to the current progress in observing the life forms of this world. I have in past threads documented Inaturalist’s coverage of animals in the phylum arthropoda. Up to this point, I have used orders as my primarily analytical unit. But, orders are very broad groups that may obscure a lot of internal variation. So I decided to use families. In this thread, we will see how much inaturalist has covered arthropods in the most detail that has ever been done. Unlike last time, I am including the arthropods closest relatives, the tardigrades and velvet worms. These groups together form the panarthropoda, but I will be referring to them as arthropods from here out for the sake of simplicity. But before we get to this, I need to established how I got my numbers.

Sources of Numbers

The number of species and observations of each family was obtained from Inaturalist, of course. The harder part was getting the number of total known species from each family. Many taxa had taxon specific taxonomic databases, so I could just get the number of species from there. This includes WoRMS and it’s subregisters, World Arachnid Catalogue, the species file group, Systema Dipterorum, Odonata Central, Chilobase, and several others. But some did not, such as mites. A few mite families had specific databases or species numbers figures given in recent papers and checklists, but for most of them I cross referenced Catalogue of Life and Zhang’s 2011 compendium of species richness. Each has benefits and drawbacks. Catalogue of Life is continuously updated but has patchy and inconsistent coverage. It is very up to date and accurate on some groups while being very lacking and poorly maintained for others. Zhang’s 2011 compendium was extremely comprehensive for its time but is over a decade out of date. So for almost every mite family (as well as many families of Hymenoptera and beetles), I checked the number of species in each source. If the number of species was higher in Catalogue in Life, I assumed it was up to date and took this as the number of species. If it was lower, then I assumed CoL coverage of that family was very poor and fell back on the number given in Zhang as the species richness value for that family. In a few instances, I used Interim Register of Marine and Nonmarine genera for groups where I felt both Zhang and CoL were deficient (primarily for families not present/recognized in either source). I also used Wikispecies for a few minor insect families which were not present in Zhang and which were absent or insufficient in CoL. It is a rather chimeric way of doing this, but I think it works. The total number of extant panarthropod species from my assembling a variety of sources is 1,489,343. For comparison, the total number of extant species for the group from CoL is 1,322,945 and from Zhang is 1,243,379.

Caveats

Before I go on, I need to make a few things clear. The number of species in each family, and even how many families there are and what counts as a family, will vary based on the taxonomic framework one adopts. It’s an ever changing science. New species are discovered, species and families are merged and split as new evidence is found. I split or merged some families that were recognized by Inaturalist to fit my dataset. But I think no matter which framework you adopt, the over all patterns seen here should remain largely similar. Another caveat is that I collected the number of species for each family from May 25th to May 30th, and the number of observed species and observations on Inaturalist for each family from May 31st to June 2nd. It is possible that within the several days of compiling the data that new species were added to databases after I took species richness numbers from them, and it is certain that new observations of arthropods were added during the three days I collected Inaturalist in. But I think this should not affect the data too much. It’s unlikely that rare families got a sizable number of new observations during this time, and super common families that get many observations daily already have so many observations that new additions do not change the number by a significant proportion. Finally, the data rests on an assumption of a certain level of identification accuracy. I cannot entirely rule out the possibility that observed species counts in families may be affected by incorrectly identified species, but I think this too should not distort the data too much. Species that are observed very few times tend to be rare or obscure, so they are likely to only be observed and identified by experts in the first place, making their identification likely to be accurate. On the flip side, very common species that are observed hundreds or thousands of times are guaranteed to be identified correctly at least some of the time. Observation counts for families should be even more robust, since misidentified species are usually mixed up with related species in the same family.

The resulting data is from 3,397 families. Two families in my dataset, Rhizoglyphidae and Horstiidae (both astigmatan mites), were in my dataset but omitted since I could not find and source for the number of species in each family.

An Introduction to Arthropod Diversity

Before we get to Inaturalist, we need to see the baseline first. Arthropod diversity is very lobsided. There are a few very large families, many medium size families and many more very small families.

As can be seen here, the largest number of arthropod families is towards the end of lower richness, with a decreasing number of families in the higher numbers. The most species rich family is Curculionidae, the weevils, with at least 77,656 species. There are 259 families with only one species, so there is no single last place.

Here is the richness of every individual family next to each other in log form.

And here is without log…

Yeah, there is a lot of beetles.

Here is a summary of the number of species in each arthropod group of the number of families within each magnitude of species richness. For example, Hexapods have 23 families with 10,000 or more species.

1 - 9 Species 10 - 99 Species 100 - 999 Species 1000 - 9999 Species 10000+ Species
Hexapoda 263 429 414 175 23
Crustacea 439 451 197 2 0
Myriapoda 70 66 37 3 0
Chelicerata 210 338 226 23 0
Lobopodia 12 14 5 0 0
Total 994 1298 870 203 23

The 30 largest families (all insects) make up around 42% of arthropod species. The bottom 2,000 families make up around 2% of arthropod species.

Observations

There are over 92 million observations of arthropods on inaturalist and counting, but as with species richness it is not equally distributed. It can be seen that families of chelicerates and crustaceans frequently have far fewer observations than families of insects, even if they have similar over all species richness. Over all, the general rule is that families with more species have more observations, but there are plenty of exceptions. Heloeciidae, a family with a single species of mangrove dwelling crab has 1,613 observations. Syringophilidae, a family of prostig mites with 425 species has only one observation.

A clearer picture can be ascertained if we plot it onto a phylogenetic tree of arthropod families. The number of observations is in log scale. (Just for reference, in all the phylogenetic trees that will be posted, Lobopodia is purple, Chelicerata red, Myriapoda brown, Crustacea blue, and Hexapoda green).

The graphs above grouping things by group may have obscured some patterns that become visible when viewed phylogenetically tree. Those long stretches of red (few observations) and black (no observations) are among the tardigrades, mites, ostracods, amphipods, and copepods. This makes sense, as these groups tend to be very tiny, hard to notice, and are often parasites inside other animals or denizens of the deep sea. The brightest green portion of the band around the tree is near the lepidopterans, which are the most observed arthropods by an enormous margin. Of the ten most observed families, six of them are lepidopterans. Outside of the insects, macro-arachnids (arachnids other than mites), especially spiders and scorpions, are observed with high frequency. Among the crustaceans, the most observed order is decapods which makes sense due to their very large size and often living in accessible habitats. That strong green streak surrounding by red and black in the bottom right part of the tree among the crustacea are terrestrial isopods. The larger centipedes and millipedes are also observed a fair amount.

Number of Observed Species

There is a general pattern that families with more known species also have more observed species. But, just like with observations, the bottom area is filled with families of crustaceans and chelicerates that have far lower number of observed species than families of insects even with the same species richness. And just like with the observations, the the phylogenetic patterns make it clear who are the culprits.

Just like the tree above, on the Chelicerate portion of the tree you can basically immediately tell the location of the mites from the non mites. Large portions of the Crustacea, being all manner of rare obscure creatures including those which are meiofaunal, cave dwelling, deep sea, parasitic or pelagic are devoid of any observed species. And species which are observed may well not be identified. Number of observed species does not correlate perfectly with number of observations.

Percent Completion

Major streaks of green, clusters of families where a high percent of species are observed, are in the Lepidoptera, Polyneoptera, Odonata, Decapoda, Juliform millipedes, scorpions and Mygalomorph spiders. These are among the large, charismatic and easily noticed arthropods. Random streaks of green outside of these clusters are mostly due to monotypic families having their own species observed, or otherwise very small families (fewer than 10 species) being observed in their entirety. Out of 3,398 families of arthropods, 413 have half or more of their species observed. This includes 236 families of Hexapods, 107 families of Crustaceans, 17 families of Myriapods, 52 families of Chelicerates, and only 1 family of Lobopodia (Peripatopsidae, the southern velvet worms). The largest families that have all their species observed are Cordulegastridae (50 species), Hedylidae (35 species), and Gecarcinidae (27 species). Most families with 100% completion are very small families with fewer than 10 species.

Notice here how the Crustaceans, Chelicerates, and Myriapods have steeper drop offs than hexapods.

There seems to be only a vague negative correlation between species richness and percent of species observe, if any at all. Perhaps usually less specious families also generate less interest most of the time. And just like in the other scatter plots crustacean and arachnid families (mostly copepods/amphipods and mites) have a lower percent of known species observed than insect families even at similar species richness.

Interestingly there does seem to be a general correlation between number of observations and percent of species observed.

You can see an upward curve towards the right of the chart, where families with huge numbers of observations are more complete on average than other families. This curve on the right can even be seen when looking at each group in isolation.

This shows that a group that gets more observations will inevitable also get more identified species at some point. Every group that has tons and tons of observations will inevitable have at least moderate completeness (by arthropod standards ofc. Compared to birds basically all arthropods are very poorly known).

Who got left out?

I think it is pertinent to take a look at some families which have no observations, and thus no observed species and have 0 percent of their species observed. There are 856 families with no observations. While many are monotypic, some are quite a bit more diverse than you would expect.

Class Order Family Number of Species
Ostracoda Podocopida Paradoxostomatidae 387
Arachnida Trombidiiformes Leeuwenhoekiidae 293
Ostracoda Podocopida Xestoleberididae 258
Arachnida Sarcoptiformes Chirodiscidae 230
Ostracoda Halocyprida Polycopidae 221
Ostracoda Podocopida Pontocyprididae 181
Arachnida Sarcoptiformes Oribellidae 175
Insecta Siphonaptera Stivaliidae 173
Ostracoda Platycopida Cytherellidae 168
Arachnida Trombidiiformes Anisitsiellidae 163
Malacostraca Isopoda Desmosomatidae 145
Arachnida Sarcoptiformes Protoribatidae 143
Copepoda Cyclopoida Anchimolgidae 142
Ostracoda Podocopida Thaerocytheridae 134
Malacostraca Isopoda Haploniscidae 132
Copepoda Cyclopoida Botryllophilidae 117
Ostracoda Podocopida Macrocyprididae 115
Malacostraca Isopoda Ischnomesidae 109
Arachnida Trombidiiformes Microdispidae 109
Malacostraca Cumacea Gynodiastylidae 106

The most represented groups among unobserved families are mites, copepods, ostracods and isopods. There is also a family of fleas and a family of cumaceans. This makes sense given what we would expect, everything here is either very tiny or aquatic, often both.

Totals

Arthropoda (and its nearest relatives Onychophora and Tardigrada) has a total of 1,489,343 species, of which 263,655 have been observed on inaturalist. That is an over all completion of 17.7%. This is actually lower than the figure given in my previous posts of around 19%. The fact I am now including velvet worms and tardigrades may be a small part of it, but most of it is likely due to my use of up to date (as much as possible) species numbers for every group and not just taking the phylum level total of CoL at face value. This is especially the case for mites which are poorly catalogued on CoL.

Final Insights

So, what does this all mean? What is the take away? Well, the first and obvious takeaway is that we are not even close to recording the vast amount of arthropod biodiversity. The vast majority of families have under half their species observe, with around 25% having no observations. But look on the bright side, isn’t that impressive? We observed 74% of all arthropod families. I would say that is a genuine accomplishment.

Observed Families Unobserved Families Total Families % of Families Observed
Hexapoda 1211 93 1304 92.86809816
Crustacea 626 463 1089 57.48393021
Myriapoda 132 44 176 75
Chelicerata 549 248 797 68.88331242
Lobopodia 23 8 31 74.19354839
Total 2541 856 3397 74.80129526

Also, despite the persecution complex surrounding the supposed neglect of insects, insects are actually the most observed, most identified, and most catalogued of all arthropods. Insects are in a much better place than the arthropod average. Over 90% of all insect families have been observed. Why does no one show pity for the mites and copepods? Perhaps, the lesson of all of this, is to gain a bit of appreciation. The fact we even came this far is impressive. Lets get those numbers of up, get out there and observe more bugs!

~~~~~~~~~~~~~~~~~~~

All the data that went into making these charts as well as the sources for the species numbers for each family is stored on an excel document and the phylogenetic trees are all on pdf documents. The data is available at request.

Please let me know of any questions, concerns, or criticisms you may have of any of this.

Super fascinating and insightful analysis @insectilluminatigetshrekt! I enjoyed the read and all your beautiful figures. Have you considered publishing this? I’m sure you could find some journals that would be interested.

Thank you. I have tried publishing the phylogeny on its own, but both ResearchSquare and BioRxiv have turned it down. But perhaps now that I actually did something with the phylogenetic tree and generated novel data, perhaps it might be accepted now.

Just like with the order level analysis, I think a few alternate ways of viewing this may be useful.

Now, since I didn’t feel like manually checking the size distribution of every family, I did assign families to micro or macro based on the general body size of the order. I assigned all families in Tardigrada, Parasitiformes, Acariformes, Palpigradi, Pseudoscorpiones, Oligostraca, Copepoda, Tantulocarida, Thermosbarnacea, Amphipoda, Entognatha, and Psocodea as ‘microarthropods’ and everyhing else as ‘macroarthropods’. This is admittedly a half hazard and imperfect way of doing this, as there are families among ‘micro’ taxa with decently large animals and families among ‘macro’ taxa made up of very small species. But still, even this rough approximation gives a clear picture. At equal species richness, families of macro arthropods are both more observed and have a far higher of known species observed than families of microarthropods. The gap is much stronger for percent of species observed than it is for number of observations because many micro arthropod families do have a moderate or large number of observations though with a far smaller portion of those observations being identified to species. If you properly assigned families to size classes (say all families with a median size of under one mm is micro), the pattern would remain similar and may be stronger.

For this, I assigned families as terrestrial or aquatic based on the primary habitat of the adult, since adult arthropods is what are observed and identified the most. All Hexapoda, Myriapoda, Oniscidea, Onychophora most of Arachnida, the Amphipod family Talitridae, and the decapod families Gecarcinidae, Gecarcinucidae, and Coenobitidae were assigned to terrestrial. All remaining Crustacea, the mite taxa Halacaridae and Hydrachnida, Pycnogonida, Xiphosura and Tardigrada were assigned to aquatic. There is also also a clear trend here, that at a given value of species richness, terrestrial families are more likely to be observed and more likely to have a larger percent of their species observed. The difference is not quite as strong as the difference between micro and macro largely due to the effect of decapods, which are large, popular, and not that difficult to identify.

This further affirms the trends identified in my previous posts. Terrestrial arthropods are more likely to be observed and identified than aquatic arthropods, and large arthropods more than small arthropods.