Data Quality Assessment incorrectly says Community Taxon is "at genus level or lower"

Please fill out the following sections to the best of your ability, it will help us investigate bugs if we have this information at the outset. Screenshots are especially helpful, so please provide those if you can.

Platform: Website

Browser: Chrome

URLs (aka web addresses) of any relevant observations or pages:
A couple of examples:
https://www.inaturalist.org/observations/30846341 (screenshots are of this one after step 2).
https://www.inaturalist.org/observations/17726517

Screenshots of what you are seeing:


Description of problem (please provide a set of steps we can use to replicate the issue, and make as many as you need.):

Step 1:
Observation 30846341 has 3 IDs (plus one withdrawn). 2 are at the subfamily level (Syrphinae). 1 is at the species complex level (Eupeodes americanus complex). The Observation Taxon is “Complex Eupeodes americanus”, but the Community Taxon is “Subfamily Syrphinae”.

Observation 17726517 also has 3 IDs. 1 is at the family level (Syrphidae), 1 at the tribe level (Syrphini), 1 at the species complex level (Eupeodes americanus complex). The Observation Taxon is “Complex Eupeodes americanus,” and the Community Taxon is “Tribe Syrphini”.

So far so good. With these IDs, neither observation is Research Grade, and both are correctly showing a community taxon above the genus level, with an observation taxon at the complex level. But…

Step 2:
Now I click on the Data Quality checkbox “No, it’s as good as it can be” under “Can the Community Taxon still be confirmed or improved?”. (I know it actually can be improved; this is just to demonstrate the behavior.)

Now both of them are Research Grade and the Data Quality Assessment section has a green checkmark for “Community Taxon at genus level or lower” and a green checkmark for “has ID supported by two or more”.

But the Community Taxon is not at genus level or lower.

The Observation Taxon is, but the Observation Taxon is not supported by two or more IDs. The Community Taxon is.

Is the Data Quality Assessment actually using the Observation Taxon, not the Community Taxon, to determine that this is at “genus level or lower”? Or is it seeing a Community Taxon at the subfamily or tribe level and incorrectly labeling it as “genus level or lower”?

The DQA is using the Community Taxon. The problem is basically that the text “genus level or lower” is wrong – anything at subfamily or lower can be converted to RG with that checkbox.

3 Likes

I suppose I could clarify a bit more:
The red X/green check on the line that says “Community Taxon at genus level or lower” is indeed linked to the Observation Taxon, but whether something is RG/Needs ID is actually dependent on the Community Taxon, which must be at subfamily or lower.

3 Likes

That line of text is inaccurate and misleading. Why not say “Community Taxon below the family level” or “Observation Taxon at the genus level or lower” if that’s what is actually required for RG and what is being used in the DQA? Both of those would be accurate and useful statements.

You can’t actually see what the Community Taxon is when in the Identify interface (as far as I know), so how can we expect identifiers to judge accurately whether the Community Taxon can be improved if the DQA misrepresents it?

I think this is confusing. It’s possible that someone could mark something as “No, it’s as good as it can be” because they think the current Observation Taxon is as good as it can be, not realising that the Community Taxon doesn’t match. I’m fairly sure I’ve done that in the past, and will probably forget and do it in future. The current functionality means that the person who adds the ID which makes the Community Taxon match the Observation Taxon is the first person who can add “No, it’s as good as it can be” and push it to research grade. Anyone who added that ID earlier would need to come back (and likely won’t have a notification to do that).

It also seems odd that I can mark an observation as “No, it’s as good as it can be” and that vote will still be valid even if the Community Taxon changes.

Ideally I’d like to be able to vote that I think a specific ID is as good as it can be. e.g. add “Eupeodes americanus complex” and vote that “Eupeodes americanus complex” is as good as it can be. After I’ve done that anyone agreeing with “Eupeodes americanus complex” could be asked if they agree that it’s good as it can be (which would count as an additional vote to get it to RG, could be improved (which would cancel out my vote) or they don’t know (which would do nothing). So a bit like adding a genus level id to an observation which has a species level ID.

Why is it not mentioned anywhere? It would help so much if we knew subfamily works too, now many observations are casual or in forever need of id. while they cold be RG.

I think that a vote of “Yes, it can be improved” should disappear or become inactive if a new ID is added. After seeing / hearing people talk about observations which had two or more agreeing IDs and no disagreeing ones, but were not Research Grade, this seems like an obvious fix.

Sorry if this is the wrong place for this, I’m not sure where the right place is.

It’s on the help page:

See also discussion here: https://forum.inaturalist.org/t/reset-can-the-community-taxon-still-be-confirmed-or-improved-after-taxon-swap/9266

We could change it to Community Taxon is precise - which is still probably a bit confusing, but it wouldn’t be inaccurate. And “precise” could be defined in the info tooltip for the DQA.

I don’t think I would have any idea what was meant by that, but you’re right, it wouldn’t be inaccurate.

Based on @jwidness’s two answers, though, the wording that would make it mean what I thought it meant to begin with would just be “Community Taxon is at subfamily or lower”. Just get the Observation Taxon out of the calculation. That would be an accurate–although not very precise–statement of the actual data quality for the observation, and also an accurate statement of what is required for RG, so the green checkmark would make sense.

It wouldn’t be much help to identifiers trying to figure out whether the Community Taxon can be improved, but at least it wouldn’t lead them astray with false precision as it did me.

… except it’s currently based on the Observation Taxon, not the Community Taxon.

It seems to me there are two pretty reasonable options for that DQA item: either it should turn green when the CID is at species or lower (the rank at which it will automatically convert to RG), or it should turn green when the CID is at subfamily or lower (the rank at which it can be converted to RG).

Both options still leave cases where the red/green DQA display won’t match the quality grade – in the first case, converted below-family observations will still have a red x when RG, and in the second case, observations below family but without the box checked will show all green in the DQA despite being Needs ID.

Alternatively, if the goal is to have quality grade = research if and only if everything in the DQA is green, you could have it change dynamically with the status of the check box. So it would say “species or lower” until someone checks no, it can’t be improved, at which point it would change to “subfamily or lower”. The logic is certainly more complex, but at least all greens is RG, and a red x is not RG.

1 Like

Correct, I apologize and it will be fixed, which leaves the issue of text accuracy.

On our test server right now we have Community Taxon is precise.

And in the pop-up for the “i” button next to Data Quality Assessment, it now says

Observations become “research grade” when

  • the iNat community agrees on species-level ID or lower, i.e. when more than 2/3 of identifiers agree on a taxon (if the community has voted that the Community Taxon can no longer be improved, this reverts to subfamily-level ID or lower)

That should cover all cases (I think) without introducing more complexity into the display or code.

1 Like

OK, slight amendment to what I wrote above.

  • If the Community Taxon is at species or lower, it should still say “Community Taxon at species level or lower”

  • If the Community Taxon is at subfamily or lower and the community votes “no” for “Based on the evidence, can the Community Taxon still be confirmed or improved?” the DQA will now say “Community Taxon is precise” This is on the site now, you can check out the two observations @trinaroberts shared in the original post.

1 Like

So it did end up dynamic, yay!

My one quibble is that “above family” should be “family or above” in the explanation text:

2 Likes

Hooray, thanks! This seems to me like a more accurate and consistent assessment of data quality.

Note that the two observations I shared as examples are currently not at research grade – I moved them back to Needs ID because the community taxon actually can be improved, at least as long as they’re still at subfamily/tribe level.

1 Like