"Research grade" above species level

I wonder if a paradigm shift might be useful. “Research grade” currently refers to an observation, based on community ID to species (or subspecies) level. What if “research grade” applied to classification instead of observation? This would involve some significant backend work, because an observation could be “research grade” at one level (e.g. insecta) but not research grade at another (e.g. American Bumblebee). Nonetheless it has benefits:

  • People would be encouraged to think about less-precise IDs, helping their knowledge and confirming lower-level classifications.
  • Folks would be incentivized to confirm/deny e.g. Complex Narceus americanus or gray tree frogs even if they couldn’t get it to species level.
  • We’d have stronger validation of low-level classifications for CV training purposes while still highlighting places where species were unclear.
3 Likes

It is already possible for observations to become research grade at ranks all the way up to (but not including) family

9 Likes

Thanks, I didn’t realize this was already built-in. Is it surfaced in the UI? I remain unclear about API v2 – is it surfaced there?

1 Like

just requires this box to be ticked in the DQA, once the observation has a community taxon/ID

but generally this should only be used if you’re certain that the ID is not improvable further

6 Likes

That makes sense, but this strikes me as an opportunity for streamlining. As a hobbyist, I’m very hesitant to say what can’t be known!

4 Likes

Yeah, off the top of my head, there are a couple moth species that can only be ID’d to genus level from photos alone. Those get RG’d to get them out of the Needs ID pool.

3 Likes

For me (in the eastern U.S.), the adult moths in the genus Xanthotype and Halysidota for example can only be reliably ID’d through dissection.

Thankfully, the caterpillars are easily distinguishable.

1 Like

What about Sciaridae? It’s nearly impossible to go past family without dissection.

2 Likes

(I assume that by “low-level” you mean “broad”, i.e., less specific)

The CV training is not directly dependent on RG status, and non-RG observations may be used in training the CV. The reason it often struggles with taxa that often cannot be identified to species from photos also has nothing to do with the way RG is calculated, but on the criteria for inclusion of taxa in the training. It is only trained on leaf taxa. What this means is that as soon as a more specific taxon is added to the model, the model is no longer separately trained on parents of that taxon. So if there is one distinctive species within a genus, it may happen that it learns that species but then becomes unable to recognize the more typical (and difficult to distinguish) members of that genus because these observations have been left at genus and the training no longer includes genus-level observations. This can happen at any rank – if it is trained on the genus it will no longer be trained on the parent tribe or family.

1 Like

I think this works well with certain taxa, but in groups like flies or wasps, where there are a lot of undescribed species or where identification requires very detailed images and/or dissection, a great deal of observations would need to be marked as “it’s as good as it can be” to become research grade, even if a broader-than-species level classification is known with a relatively high certainty. And it can be quite hard to know when an ID cannot be improved.

Having observations get to research grade can be quite useful for studies as it get imported to GBIF which packages it in a neat and cite-able format, so, while I think this box in the DQA has its uses, I don’t think it’s an effective way to bring good quality observations with genus or broader level identification to research grade.

1 Like

It would certainly be possible to send any observations that have a CID at any arbitrary level to GBIF, but there might be some issues with this approach. The biggest drawback I can think of is that these CIDs would likely be much less stable than those for RG observations. For instance, if an observation were exported to GBIF under the current “rules” (CID below family) regardless of RG, for many observations, the CID might be refined from export to export, leading to unstable IDs.

I also think that the usage of non-species level observations on GBIF is lower than for those at species level, which is the primary level of many studies. There are certainly going to be exceptions to this (taxa where species level IDs are nearly impossible), but these are also a smaller proportion of observations on iNat. So I think the benefits to this might be limited.

Perhaps this is slightly tangential, but I find a bit of ambiguity here. There are some observations where I am 100% confident they cannot be IDed to species level because multiple publications say no consistent morpholigcal differences have been found and that sequencing is required for ID. So I’m certain “it’s as good as it can be” now. But am I certain it’s as good as it ever could be? Perhaps in the future with a larger number of sequenced examples, subtle morphological differences will be noticed, or an association with some environmental factor will be found, and currently indistinguishable species will become distinguishable.

The example I have in mind is Morchella eximia and sextelata (burn morels). They can’t be distinguished by morphology (yet) and have broadly overlapping ranges and habitat preferences (coniferous burn sites). Relatively recently, sequencing found one of these two species associated with burned coastal redwoods, while the other has not yet been detected there. The most qualified IDers are now using this habitat as diagnostic, and IDing to species in these areas.

How can I know if new diagnostic indicators might be discovered in the future for any currently unidentifiable observation? I’ll typically bump these observations back up to the appropriate level, but still don’t mark them “as good as can be” because of this uncertainty. I can’t speak to the value of higher ID levels to research, but to the extent that there is any, it seems like these observations should still be available to researchers even if there is a small chance they might be further improved some day.

Research Grade observations are still available to identifiers. So, if a useful ID character is discovered in the future, identifiers can run through the RG observations as well as the Needs ID ones. Yes, that will be a pain in the neck, but I think it happens all the time in herbaria and museums.

2 Likes

I think it’s beneficial to mark these “good as can be” as it takes them out of the Needs ID pool. If new characters are found in the future, they can easily be reassessed.

If we went by the criteria that “you must not use this unless you can be certain that no criteria will ever be discovered” we would never use the box at all (which means there would be no point to it’s existence).

2 Likes

Thanks for the clarification!

That’s more of an issue with the lumper/splitter philosophy itself – as in, how much genetic variation are we willing to acknowledge within a species?

Same question applies: is every ecotype a separate species if its adaptation has underlying genetics?

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.