Identification accuracy: Western Australia Melaleuca observations

As part of one of my PhD chapters, I recently (12 February - 3 March) organised and ran a three week IDathon on iNaturalist for observations of plants in three biodiversity hotspots in Western Australia (WA). I recruited ~60 ‘experts’ with knowledge of WA plants: these included lots of professional taxonomists and botanists, but also amateur experts and herbarium volunteers, and included both pre-existing iNat users and new recruits. During the event we reviewed almost 12,000 observations (with another 5,000+ from the three focus areas already identified by at least one of these experts before the event began). It will be a few months before I get the chance to crunch all of the data, but I thought it’d be interesting to look at a small case study now.

Botanist + taxonomist Brendan Lepschi is an expert in Melaleuca, and he made 322 identifications during the event, so I assessed each observation he IDed.

As brief background, Melaleuca is a genus of trees and shrubs in the family Myrtaceae, and is one of Australia’s most diverse genera, with ~250 described species. Western Australia is the centre of diversity for the genus, with ~200 species known from the state (almost all natives, plus a small handful of naturalised east coast species). Diversity is also generally very high even at smaller spatial scales, with often 40-60 (or even more) species found in a single national park. Broadly, it is also a genus for which identification is generally difficult, especially from photographs only, and indeed for some species groups, identification is hard even with a specimen in the hand without fertile material. There are certainly plenty of species in the genus which are easily recognised and identifiable to species from photographs, but there are also a great many that are either very difficult to identify from photographs, requiring expert knowledge of the group (the few keys that exist are very daunting to use given the huge number of species and couplets, and the fact you need fertile material is really important for many species IDs), or indeed are impossible to ID from photographs alone. Many of these latter examples fall into a few species groups/complexes where identification is notoriously difficult, and indeed for these taxa, there are numerous herbarium vouchers also misidentified or that now have misapplied names due to delays in re-detting after taxonomic revisions.

So overall, Melaleuca is on the tough end of the spectrum when it comes to identification, and it would be a reasonable assumption that identification accuracy on iNaturalist would not be particularly high, especially for a region like WA with very high diversity (and relatively few identifiers compared to the eastern states).

(as a quick aside here, on iNat, Melaleuca is Melaleuca sensu latu. There are a number of segregate genera [Beaufortia, Calothamnus, Regelia, etc] that about ten years ago were all transferred into Melaleuca. The Western Australian herbarium and the Australian Plant Census still treat these genera as valid, but POWO and iNat lump them into Melaleuca. Brendan’s expertise is in Melaleuca sensu strictu, and that’s what almost all of his 322 IDs were of aside from a very small handful of exceptions).

So now to the stats. First of all, of the 322 observations Brendan IDed, 4 of them were not actually Melaleuca, meaning 99% of the observations were correctly identified at the genus level. There are other Myrtaceae genera that can be easily confused with Melaleuca, e.g., Kunzea, so this is a nice statistic even though it may not seem especially impressive at face value.

220 observations were identified to species before Brendan’s IDs. He confirmed 175 of these as correct, ie 80% of observations identified to species were confirmed as correct. As for the other 20%, around half were corrected from one species to another, and the other half were pushed back to genus as a species ID wasn’t possible from photographs/the photographs provided.

Of the 102 initially only identified as Melaleuca, 4 were corrected to a different genus, 15 were confirmed as genus Melaleuca but were not identifiable any further, 79 were refined to species by Brendan, and 4 were cases where the observer had identified the record to species X, an identifier had added an ID of species Y and pushed the record back to genus, and Brendan confirmed species Y as being the correct ID. Of the 23 observations that were corrected from species X to species Y, 10 of them were cases involving one of the difficult species groups, in which even herbarium specimens are misidentified or have misapplied names, such as this example:

.
So overall, 80% of the 322 observations were identifiable to species, and before Brendan reviewed the records, 80% of observations already at species were correctly identified. After his review, the observations covered 57 different species, including a few new species for iNat.

These are pretty impressive results given the difficulties involved in identifying this genus as I discussed above - high diversity, lots of sympatry, daunting keys, importance of fertile material - and certainly these stats are higher than what I expected before the event.

42 Likes

I think you pointed out many aspect of the “power” of iNat.
First of all, the possibility to gather people from different sectors who share an interest and make them participate in a scientific exploration.
Moreover, with the right approach, in many cases also difficult taxa can be treated and identified to species.
I think that this also underline that the collaboration of a certain number of users is something that is desirable to get the maximum number of correctly identified observations.

Now I wonder if you have already planned to promote and/or to publish outside iNat the results of this effort. I think this could be a way to encourage the knowledge of the natural heritage together with the need to protect natural environments.

13 Likes

This is awesome! Can’t wait to hear more results from this.

4 Likes

It’s amazing what having a resource like iNat can do for advancing knowledge, even of difficult-to-ID species. Think of how long it would have taken before iNat existed! And keep up the great work!

5 Likes

I would much prefer cases where the observer had identified the record to species X, an identifier had added an ID of species Y and pushed the record back to genus, and Brendan confirmed species X as being the correct ID. I take it that never happened.

1 Like

it happened once

also out of interest, why would you

I can’t quite figure out why this would be a preferable situation

4 Likes

yes, the results will be published through two papers and also be written up as one of my PhD chapters

6 Likes

57 species observed and identified of about 200 in WA, is still impressive for Melaleuca but does indicate that there are a lot of difficult taxa in just that genus alone. The genera with no active experts are going to struggle and the good results are as much the result of effort of facilitators on inaturalist, including curators who assist experts when they come up against limitations of the framework.
Great work Thomas.

5 Likes

Mainly because it vindicates the original observer.

1 Like

That is a very interesting and impressive accuracy. I wonder if the computer vision is getting so good that it can correctly discern species based on characteristics not obvious to a human observer, without access to the “classical” distinguishing features?

3 Likes

Would it be possible to see if the original observer used CV before selecting a species?

1 Like

for this particular subset of data, no. Out of 322 observations, the computer vision was only used by the original observer 12 times: it was correct 7 times and wrong 5 times. Out of the 57 species in this dataset, only 2 or 3 have enough observations to be in the model.

6 Likes

But - remember - lots of identifiers use CV as a shortcut to the ID they know.
‘Used CV’ does not mean they have - no idea, and just used the suggestion.

5 Likes

Indeed.It is much easier to click on the CV if it comes up, than to type the name in and have to correct the auto-correct from the computer.

4 Likes

I had noticed some IDs or corrections of my observations.
This is a good project and valid use of iNaturalist.

I do that all the time :+1:

2 Likes

Only if one assumes that the relationship between observer and IDer is basically antagonistic and not cooperative.

If an observer receives an ID they think is incorrect, they are always free to query the person providing the ID about it; unless it happens to be a person who doesn’t follow notifications, I find that most IDers are willing to reconsider their ID or explain their basis for it. So the observer had plenty of opportunity to “vindicate” themselves before a taxon specialist came along to verify the ID.

6 Likes
  1. Have you considered adding “complexes” for those species sets that cannot readily be identified to species by photographs.
    I see only https://www.inaturalist.org/taxa/1491620-Melaleuca-leucadendra - but it is hard to believe that so many species (12!) cannot be told apart.
    We use them a lot for some of our amphibians (and in some cases we are happy to ID to species based on geography in parts of the range, but to complex where the species are known to overlap).

  2. Have you contemplated adding subgenera, sections, subsections to the larger genera? We have done it in the Cape for most genera over 50 [e.g. https://www.inaturalist.org/taxa/139725-Aspalathus] species (even though some are blatantly artificial (e.g. https://www.inaturalist.org/taxa/55776-Erica]; although a few are considered unusable [e.g. https://www.inaturalist.org/taxa/83365-Indigofera], and so only partially done).
    The significance of this is that (1) often one can ID to section without knowing the species (or temporarily bin them for later attention), and (2) closely related misIDs default to a subgeneric level, rather than a full generic level, and (3) the compare and identotron tools default to the higher level above a species selected, reducing the possible candidates from hundreds to a handful.
    The same code caters for outstanding identifications above species level whether subgenera are present or not [e.g.https://www.inaturalist.org/observations/identify?reviewed=any&quality_grade=needs_id%2Cresearch%2Ccasual&verifiable=true&taxon_id=129714&lrank=complex&hrank=family], although of course, it is much easier to focus on specific subgeneric taxa during review if these ranks are present.

3 Likes