Add the field "Identified by" in INat and link it to the GBIF field "identifiedBy"

I guess it would be possible to only include in GBIF the name of the identifier if their profile on INaturalist is linked to a name. this could be a nice compromise I think, and reduce the amount of silly names on the GBIF identified_by section.

2 Likes

We currently populate the identifiedByID field, but only with the ORCID of the person who added the first improving identification that is the same as the observation taxon (here’s an example). We could also populate the identifiedBy field, which would be subject to falsification since anyone can change their name to anything on iNat, but the same is true of the recordedBy, so maybe that’s not a very high risk.

Who to list as an “identifier” is a bit more tricky and it depends a lot on how the field will be used. The whole issue of listing identifiers in our DarwinCore Archives came to my attention because there was a desire for attribution, e.g. people should be able to demonstrate that they’ve made useful contributions to biodiversity datasets both in the form of primary records like observations and specimens and in the form of curatorial acts like identifications. I chose the person who made the first improving identification because that’s the only person that we know didn’t just “agree” with a prior identification. If we list all identifiers, it would be very easy to claim that you performed useful work when all you really did was copy the work of others by adding the third or fourth (or 50th) identification to observations that had already been identified.

However, if the desire is to use the identifiedBy or identifiedByID fields in the way you describe, i.e. to filter occurrences by who identified them, then it doesn’t matter how many people we list as identifiers, as long as the person you’re searching for is on that list.

The two purposes are kind of at odds and I’m not sure how to support them both. We have no concept of a “renowned expert” on iNat, or indeed of any kind of expert, so we can’t just include “expert” identifiers. We could pick some arbitrary limit, e.g. the person who added the first improving identification and the subsequent 5 support identifiers, but that will almost certainly lead to problems if, say, your Passiflora expert was the 10th supporting identifier.

I’m curious to hear what others think about these conflicting uses of this field and what we can do to facilitate them both on iNat.

Background discussion at https://github.com/inaturalist/inaturalist/issues/1857 and https://github.com/inaturalist/inaturalist/issues/1857

5 Likes

Ok, I asked the powers that be about what the intent of that field was, and they confirmed it was primarily to provide attribution to the person who provided the original identification and not necessarily to track all the people who may have confirmed that identification, so while I will add the identifiedBy field, I won’t populate that and identifiedByID with all the people who added an identification.

@timrobertson100 pointed out that there is an Agent Actions extension in development that might support specifying all the people who added identifications, so maybe if/when that gets supported we can start using it. However, I think iNat will always be a bit of an outlier in the way we handle identifications relative to more controlled systems like physical collections, so it’s possible we may not be able to use such an extension exactly in the way it was intended.

3 Likes

I’m commenting on this post late. My taxa of interest are bees and wasps.

I agree with the original way you mentioned for GBIF to display “identified by” (the first person). I did not think multiple users are needed even in the fields, although I do not mind either way.

I have a few questions and observations (not that you have to answer them all, up to you):

If a species ID is first recorded by one person, and then another specifies subspecies, is the second the new GBIF “identified by”? (I prefer this, assuming subspecies were correct. I noticed some wasp taxonomists do not like using subspecies in general, which I do not agree with).

Why does GBIF have “identified by,” and “interpreted by,” are they different?

Is it correct that the iNat. “Determiner” field does not affect GBIF in any way? (I noticed at least one insect taxonomist does not advise using the Determiner field, which I do not necessarily agree with.) In general, do (none of the) iNat. fields affect GBIF?

Discover Life (.org) also includes verified iNaturalist records (whether transferred over like for GBIF or are manually added, I do not know). However, D.L. only lists the name of the iNat. Observer (I would prefer it be iNat. identifier like for GBIF).

1 Like

No. Currently identifiedBy is the first improving identification that matches the observation taxon exactly. In your scenario, the subspecies ID would be “leading,” not “improving.” Also, in your scenario, the observation would not be Research Grade so it wouldn’t get shared with GBIF.

Why does GBIF have “identified by,” and “interpreted by,” are they different?

I’ve never heard of “interpreted by” and I don’t see it on the list of supported terms at http://tools.gbif.org/dwca-assistant/, so I don’t know. You’d have to ask GBIF, I guess. If you’re talking about the values in the “Interpreted” columns on a page like https://www.gbif.org/occurrence/3013858291, those represent how GBIF is interpreting the data from the publisher (iNat). For example, in the “Establishment Means” row you’ll generally see the “Verbatim” value from iNat is “wild” while the “Interpreted” value is blank. That means iNat is providing a value for that field, but GBIF is interpreting that value as not meaning anything, so they just ignore it.

Is it correct that the iNat. “Determiner” field does not affect GBIF in any way?

Again, I don’t know what that is. If it’s an Observation Field, those are entirely controlled by iNat users, not iNat staff, and it should not have any effect on the observation as it appears to GBIF.

Discover Life (.org) also includes verified iNaturalist records (whether transferred over like for GBIF or are manually added, I do not know). However, D.L. only lists the name of the iNat. Observer (I would prefer it be iNat. identifier like for GBIF).

You’d have to take this up with Discover Life. Basically for any third party that is using iNat data, we can control what they have access to, but we can’t control how they use it.

So this is the reason why “improving” is so coveted:

Status on a different website as incentive for chasing first-to-be-confirmed identifications. I wonder if it wasn’t better to just report “identifiedBy” as “iNaturalist.org” for all observations exported to GBIF.

I don’t think that is a fair assumption; I’d doubt the label on GBIF is a major driver of behavior. Anyone can see their ID categories on iNaturalist, e.g. on the IDs page or Year in Review stats: https://www.inaturalist.org/identifications?user_id=schoenitz / https://www.inaturalist.org/stats/2020/schoenitz Probably better discussed elsewhere.

Returning to the subspecies scenario (this might just be a basic clarification for me about how iNat. community vote works). I should have said, suppose person 1 suggests species correctly. Person 2 confirms species and specifies subspecies correctly. Person 1 is unsure if subspecies is correct, then Person 3 confirms subspecies. So is subspecies never related to research grade per se? (even if 2 people confirm it). I think what I’m understanding is, the observation becomes R.G. as soon as the species is sufficiently confirmed. I guess maybe my own input is I might like the subspecies to count as a more detailed R.G. standard (unsure if that would work though, even if the system were ever wanted to be changed).

That’s correct I meant “interpreted” columns on a GBIF page like the 2nd link. Maybe I meant the “Identified by” column instead (but at that same link). In the table there, Nathan Odgers is listed for Interpreted (row) x Interpreted (column), and also Interpreted (row) x Original (column). I just meant is it always the same person in both, and why 2 values (probably unimportant).

That’s correct Determiner is an iNat. Observation Field (I think we agree it wouldn’t affect GBIF).

I may ask Discover Life at some later time, I haven’t spoken to them before yet.

Ok, so your scenario now looks something like

ID 1: Vulpes vulpes
ID 2: Vulpes vulpes ssp. vulpes
ID 3: Vulpes vulpes ssp. vulpes

At this point, the observation taxon will be Vulpes vulpes ssp. vulpes, the Community Taxon will be Vulpes vulpes ssp. vulpes, the observation will be RG, and we will share it with GBIF along with the name Vulpes vulpes ssp. vulpes.

Subspecies do have some weird effects on the Community ID, mostly because of situations like

ID 1: Vulpes vulpes
ID 2: Vulpes vulpes ssp. vulpes

where the community ID is Vulpes vulpes even though ID 2 is leading. On iNat, a species-level ID is generally considered good enough, and since the subspecies ID supports the species-level ID, we just consider that 100% agreement at the species level. The obs can be RG and a subspecific level with more subspecies IDs, but at this point the obs is RG and, we think, good enough to share.

Okay.

Some possibly related questions, but for Observers:

  • Does ORCID get related to “observedBy” in GBIF?
  • Does a collection of Observations by the same ORCID qualify as a “Dataset”?
1 Like

If we’re meaning the same thing I think GBIF shows the iNat observer user as observed by, although I’m not entirely sure what you mean and others would know about ORCID better.

Yes, the user name shows up as “observedBy” in GBIF. But if that user has an ORCID, that relation is lost.

Do you mean the observer won’t be shown in observedBy? If so, what’s shown instead?

I’ve since confirmed that GBIF includes an Observer ORCID field, though I don’t recall what it’s called.

So, different observer fields are populated depending on if the user has ORCID? I’d think it would be more ideal if only one field was populated either way, what do you think? The largest difficultly I noticed with GBIF is when people upload museum specimens other people collected and identified, GBIF’s observer or identifier fields are misleading because they only list the iNat users.

I just re-downloaded by GBIF query for NYC Anthophila.

GBIF has “recordedBy”/“identifiedBy” which is the user’s name on iNaturalist.

It also has “recordedByID”/“identifiedByID” with the ORCID, if the user has set that up for themselves in iNaturalist.

Interesting, I may also use ORCID later on.

For downstream QA / QC of identifications by data users, the most useful functionality would be to have all identifications provided by all users associated with the observation so that you could, e.g., easily pull out all observations identified as taxon A by user X, or systematically replace community IDs with IDs provided by user X.

From my point of view the best practice for data users, at least if species-level ID is important to their research and they are working with species that aren’t toward the easy-to-ID and charismatic end of the spectrum, is to treat the community ID as a tool for pulling out a set of potentially relevant observations but to entirely ignore the community ID otherwise. Check for identifiers known to be reliable; use their IDs when available; otherwise identify all observations yourself.

2 Likes

With regard to how to identify users, surely the best practice is to provide username, real name (if available), and ORCID (if available).

For what it’s worth, I hadn’t even realized you could connect your iNaturalist account to your ORCID. So I just connected it, but I’m guessing the ORCIDs are not well-populated even for the subset of users who have an ORCID…

1 Like