Add the field "Identified by" in INat and link it to the GBIF field "identifiedBy"

Add an “Identified by” field that saves the names of the users who agree on the leading ID of research grade observations. This field would appear when downloading data from INaturalist or uploading data to GBIF.
This will allow a better integration of INaturalist data in GBIF, and let scientists filter in GBIF the identifications made by specialists to increase the reliability of the occurrences extracted from INat.

Sorry in advance for my bad spelling, English is my second language.

2 Likes

Not everyone enters their actual name as part of their iNaturalist account - so would you be OK with usernames?

1 Like

I am aware of this, but still, being able to retrace some mega-identifiers is quite important I think. For example, I know of a world expert on Passiflora that was reviewing observations on INaturalist as part of his work on a monograph of the genus. I think it would be quite valuable to know in GBIF when identifications were made by renowned experts in the field. Most scientists put their real name on their INaturalist profile anyway, and in GBIF the real name of the observers (if specified on their INaturalist account) is already linked to their observations instead of their username. The same could be applied to the persons who add their identifications.

3 Likes

I guess it would be possible to only include in GBIF the name of the identifier if their profile on INaturalist is linked to a name. this could be a nice compromise I think, and reduce the amount of silly names on the GBIF identified_by section.

2 Likes

We currently populate the identifiedByID field, but only with the ORCID of the person who added the first improving identification that is the same as the observation taxon (here’s an example). We could also populate the identifiedBy field, which would be subject to falsification since anyone can change their name to anything on iNat, but the same is true of the recordedBy, so maybe that’s not a very high risk.

Who to list as an “identifier” is a bit more tricky and it depends a lot on how the field will be used. The whole issue of listing identifiers in our DarwinCore Archives came to my attention because there was a desire for attribution, e.g. people should be able to demonstrate that they’ve made useful contributions to biodiversity datasets both in the form of primary records like observations and specimens and in the form of curatorial acts like identifications. I chose the person who made the first improving identification because that’s the only person that we know didn’t just “agree” with a prior identification. If we list all identifiers, it would be very easy to claim that you performed useful work when all you really did was copy the work of others by adding the third or fourth (or 50th) identification to observations that had already been identified.

However, if the desire is to use the identifiedBy or identifiedByID fields in the way you describe, i.e. to filter occurrences by who identified them, then it doesn’t matter how many people we list as identifiers, as long as the person you’re searching for is on that list.

The two purposes are kind of at odds and I’m not sure how to support them both. We have no concept of a “renowned expert” on iNat, or indeed of any kind of expert, so we can’t just include “expert” identifiers. We could pick some arbitrary limit, e.g. the person who added the first improving identification and the subsequent 5 support identifiers, but that will almost certainly lead to problems if, say, your Passiflora expert was the 10th supporting identifier.

I’m curious to hear what others think about these conflicting uses of this field and what we can do to facilitate them both on iNat.

Background discussion at https://github.com/inaturalist/inaturalist/issues/1857 and https://github.com/inaturalist/inaturalist/issues/1857

5 Likes

Ok, I asked the powers that be about what the intent of that field was, and they confirmed it was primarily to provide attribution to the person who provided the original identification and not necessarily to track all the people who may have confirmed that identification, so while I will add the identifiedBy field, I won’t populate that and identifiedByID with all the people who added an identification.

@timrobertson100 pointed out that there is an Agent Actions extension in development that might support specifying all the people who added identifications, so maybe if/when that gets supported we can start using it. However, I think iNat will always be a bit of an outlier in the way we handle identifications relative to more controlled systems like physical collections, so it’s possible we may not be able to use such an extension exactly in the way it was intended.

3 Likes

No. Currently identifiedBy is the first improving identification that matches the observation taxon exactly. In your scenario, the subspecies ID would be “leading,” not “improving.” Also, in your scenario, the observation would not be Research Grade so it wouldn’t get shared with GBIF.

Why does GBIF have “identified by,” and “interpreted by,” are they different?

I’ve never heard of “interpreted by” and I don’t see it on the list of supported terms at http://tools.gbif.org/dwca-assistant/, so I don’t know. You’d have to ask GBIF, I guess. If you’re talking about the values in the “Interpreted” columns on a page like https://www.gbif.org/occurrence/3013858291, those represent how GBIF is interpreting the data from the publisher (iNat). For example, in the “Establishment Means” row you’ll generally see the “Verbatim” value from iNat is “wild” while the “Interpreted” value is blank. That means iNat is providing a value for that field, but GBIF is interpreting that value as not meaning anything, so they just ignore it.

Is it correct that the iNat. “Determiner” field does not affect GBIF in any way?

Again, I don’t know what that is. If it’s an Observation Field, those are entirely controlled by iNat users, not iNat staff, and it should not have any effect on the observation as it appears to GBIF.

Discover Life (.org) also includes verified iNaturalist records (whether transferred over like for GBIF or are manually added, I do not know). However, D.L. only lists the name of the iNat. Observer (I would prefer it be iNat. identifier like for GBIF).

You’d have to take this up with Discover Life. Basically for any third party that is using iNat data, we can control what they have access to, but we can’t control how they use it.

So this is the reason why “improving” is so coveted:

Status on a different website as incentive for chasing first-to-be-confirmed identifications. I wonder if it wasn’t better to just report “identifiedBy” as “iNaturalist.org” for all observations exported to GBIF.

I don’t think that is a fair assumption; I’d doubt the label on GBIF is a major driver of behavior. Anyone can see their ID categories on iNaturalist, e.g. on the IDs page or Year in Review stats: https://www.inaturalist.org/identifications?user_id=schoenitz / https://www.inaturalist.org/stats/2020/schoenitz Probably better discussed elsewhere.

Ok, so your scenario now looks something like

ID 1: Vulpes vulpes
ID 2: Vulpes vulpes ssp. vulpes
ID 3: Vulpes vulpes ssp. vulpes

At this point, the observation taxon will be Vulpes vulpes ssp. vulpes, the Community Taxon will be Vulpes vulpes ssp. vulpes, the observation will be RG, and we will share it with GBIF along with the name Vulpes vulpes ssp. vulpes.

Subspecies do have some weird effects on the Community ID, mostly because of situations like

ID 1: Vulpes vulpes
ID 2: Vulpes vulpes ssp. vulpes

where the community ID is Vulpes vulpes even though ID 2 is leading. On iNat, a species-level ID is generally considered good enough, and since the subspecies ID supports the species-level ID, we just consider that 100% agreement at the species level. The obs can be RG and a subspecific level with more subspecies IDs, but at this point the obs is RG and, we think, good enough to share.

Some possibly related questions, but for Observers:

  • Does ORCID get related to “observedBy” in GBIF?
  • Does a collection of Observations by the same ORCID qualify as a “Dataset”?
1 Like

Yes, the user name shows up as “observedBy” in GBIF. But if that user has an ORCID, that relation is lost.

I’ve since confirmed that GBIF includes an Observer ORCID field, though I don’t recall what it’s called.

I just re-downloaded by GBIF query for NYC Anthophila.

GBIF has “recordedBy”/“identifiedBy” which is the user’s name on iNaturalist.

It also has “recordedByID”/“identifiedByID” with the ORCID, if the user has set that up for themselves in iNaturalist.

For downstream QA / QC of identifications by data users, the most useful functionality would be to have all identifications provided by all users associated with the observation so that you could, e.g., easily pull out all observations identified as taxon A by user X, or systematically replace community IDs with IDs provided by user X.

From my point of view the best practice for data users, at least if species-level ID is important to their research and they are working with species that aren’t toward the easy-to-ID and charismatic end of the spectrum, is to treat the community ID as a tool for pulling out a set of potentially relevant observations but to entirely ignore the community ID otherwise. Check for identifiers known to be reliable; use their IDs when available; otherwise identify all observations yourself.

2 Likes

With regard to how to identify users, surely the best practice is to provide username, real name (if available), and ORCID (if available).

For what it’s worth, I hadn’t even realized you could connect your iNaturalist account to your ORCID. So I just connected it, but I’m guessing the ORCIDs are not well-populated even for the subset of users who have an ORCID…

1 Like