Change how id's work when there are disagreements

Proposal:

Imagine all identifications given to an observation. Make a tree out of them, and navigate the tree by picking the branch with most support until no branch is supported by greater than [insert cutoff]. Stop here for the CID. For the Observation ID proceed if the branches don’t actually bifurcate after that point (because there are no alternative/disagreements), that is, continue until there is a split. Return the group with all the options at the split. Thus the Observation id will always be nested within the community ID.

Cause:

The id’s are 3:1 bird vs plant, so it goes to Aves, but the leading ID on top (not the Community ID) should also then go to whistling duck, which is uncontested among ‘aves’. If the plant id was not there, which is what the ‘aves’ id’s were supposed to accomplish, it would be at the whistling duck.
https://www.inaturalist.org/observations/203214238#activity_identification_057cb136-e08e-4f62-af63-86c3d462611b
In general, if an id tips the category of the ‘higher level’ id / ‘official’ id of the observation, finer ids should be processed ‘given’ this higher-level id, and ids disagreeing with this id ignored unless the higher-level id for the whole observation changes.

Also: When an observation has a conflict where someone has bumped an id a level up, explicitly disagreeing with the earlier fine id, and one clicks ‘agree’ with this higher-level id, it should have the option to be an explicit disagreement to the finer id as well, and not be marked as just a generic agreement with the high-level id.

I don’t know if this should be an official feature request, and there have been so many discussions about how id’s work that I’m not sure where this belongs, but I think that this is an outstanding problem after all that discussion that’s fixable.

4 Likes

this was previously discussed in https://forum.inaturalist.org/t/community-taxon-algorithm-tweaks/28583.

i think this is effectively what happens now when you explicitly disagree. see: https://forum.inaturalist.org/t/change-to-ancestor-disagreement-implementation/49276/6.

I recently agreed (yesterday) and was not offered the option to make it an explicit disagreement with the finer level. I had to add a fresh id to get the question.

1 Like

That obs has the CID at 2 birds vs the birds-of-paradise Strelitzia.
I often add a ‘supporting taxon specialist’ ID - which I withdraw - in this case, would be when the next birder chimes in.

Aves instead of Life at least gets it to birders.

If you open the previous discussion I will comment there.

P(A|B) (probability of A given B) is how we should calculate finer ids (such as the organism id given at the top of the page), where B is the community id and A is any finer id.

The example with 1000:2 ids at a high level and yet only 2:1 at the species level because most are high-level ids captures the intuition very well. Since many of those high-level ids are themselves incompatible with a certain species level id they should be given weight at this level.

Id’s should be calculated based on ‘Given x, what is it most likely’ at each level. I like the algorithm Matthew described. I only wanted this where the organism id is calculated ‘given’ the community ID, but his algorithm does it at every level, which is maximally intuitive, and leads to more specific id’s that are also reasonable.

In terms of your blackbird vs Sparrow example, where you say ‘some use cases should not be prioritized over others’, if it’s 3:1 sparrow to blackbird, the iNat system would put the id as sparrow no matter what. It is also not counterintuitive behaviour; I don’t therefore think it has to do with this specific case.

If you want to bump observations out of sparrow and into bat based on your specific knowledge, expert weighting - which iNat has disagreed on (but I see we now have # of id’s by our name each time we id) - or weighting disagreements more highly - which has been discussed - are the tools you would need.

It shouldn’t impact solving this, which is weird behaviour according to the normal human logic in this specific instance.

2 Likes

The CID is fine, but it should have the id at the top of the observation (which usually gets more specific) at the whistling duck, and it should also show up in searches of that order/family/genus/species.

that is a Chrome extension - not an integral part of iNat.
But very useful.

That the display ID up top and the CID often don’t agree, is confusing for us.
I have withdrawn a couple of bee IDs today, to force iNat to show the taxon specialist’s ssp.
https://www.inaturalist.org/observations/203090341

1 Like

in cases where you had sparrow identifications of different taxon ranks, that proposed algorithm would put it at sparrow species, possibly at research grade, whereas the current algorithm would leave it at a higher-level sparrow taxon and not research grade. that’s the difference.

the current algorithm gives priority to disagreeing IDs over trying to force a consensus to a low rank. i think that’s fine.

I think that’s the proper behaviour if there are enough specific id’s for sparrow: it’s the same case as if the initial id was bat and was wrong, and more people have to supply ids to get it overturned to sparrow. That’s where the 2/3 law for CID comes in and I’ve nothing really to say about it. If sparrow is not outvoted at this level, it should absolutely go on being a sparrow id at the finer ‘observation id’ level.

It will also not go to research grade unless there are confirming sparrow ids at the species level.
That is the normal behaviour: the obs id goes beyond the CID, nothing gets to research grade w/o confirmation, and it takes a lot of id’s to overturn anything - BUT - you don’t get the weird behaviour where a high level id does not contribute to disagreeing with a finer level id it is incompatible with.

If enough sparrow id’s are present, both algorithms will end up at a higher-level sparrow taxon, especially for the CID. None of them go to research grade unless there are at least two species-level id’s which agree.

I propose the species-level id’s be judged on agreement with the CID, but not on an overall level w/ every other species ID.

You can let there be a higher threshold to get to RG if there is some higher-level disagreement on the obs, idk. You are essentially wanting disagreements to be weighted higher than neutral, or for there to be a higher threshold for RG on obs with disagreement at a higher level (e.g. at CID level), I believe.

I don’t think this is fine behaviour as it is - it’s a bit ridiculous how it doesn’t count the implicit disagreements, and doesn’t understand that A is a subset of B, where A is some species and B is some higher-level taxon, with C being some other species ID which conflicts with both.

That makes the identification/disagreement-tallying model unphysical, i.e. not suited to the real thing it is trying to model.

Maybe I’m just thinking about this differently, but the current system seems pretty intuitive to me. If 5 people put IDs on something, 3 only feel confident that it’s some sort of bird, one thinks it’s a sparrow, and another thinks it’s a fruit bat, I think “bird” is a perfectly reasonable summary of the community’s ID opinion. Sure, one person thinks they know what particular bird it is, but one person doesn’t even think it’s a bird at all. Average those two opinions out and add in that three others think it’s a bird but don’t know what kind, and “well, we think it’s a bird anyway” seems like a good conclusion. To zoom the ID all the way down to “sparrow” based on 1 suggestion out of 5, when not all 5 can even agree it’s a bird at all, seems to me to put way too much emphasis on that one particular ID suggestion at the expense of the suggestion that it’s not even a bird.

If I’m understanding your suggestion, you’re saying that because more people say “that’s a bird I don’t recognize” than say “that’s not a bird”, the community has implicitly agreed with whatever particular bird species was suggested by one user. My issue with this is that is essentially “erases” the IDs that disagree with it being a bird. 3 generic “bird” IDs and a “sparrow” ID results in “sparrow” CID. That makes sense to me. You’re suggesting that 3 generic “bird” IDs, a “sparrow” ID, and a “that’s not even a bird” ID should result in the exact same “sparrow” CID, essentially ignoring that dissenting opinion entirely. ie, if not everyone is even convinced it’s a bird at all, saying “the community thinks this is a sparrow” as soon as the scales tip in favor of “bird” seems misleading to me.

It’s easy to point to an observation where that dissenting opinion is clearly wrong and be frustrated that it’s “messing up” the CID that one would like, but that dissenting ID is still part of the community, and a CID system that entirely ignores it isn’t in the spirit of what CID is meant to express, IMO.

2 Likes

I don’t want the community ID to change from being ‘birds’. I want the ‘observation ID’ to update because it is reasonable to do so (there is a more specific ID under birds available, uncontested by another bird ID).

At the top of the observation is what I will call an ‘observation id’. This determines which searches it shows up on, is what is displayed on the tile, and is the finest id applicable to the entire observation. That usually is a few levels further than what is called the ‘Community ID’, which is the finest level supported at > 3:2 agreement amongst all the identifications (the exact definition how we calculate the agreement is something I’m quibbling about).

The purpose of the community ID is to mark the place of consensus, while the purpose of the observation id is to move the observation forward, allow it to be found in searches, put it in front of the right eyes, etc. In this case, it fails to catch up to where it would be w/o the initial disagreement.

(I think the 3:2 agreement used for CID, and all of the agree/disagree calculations on iNat, should take into account the idea of subsets, or conditional logic, however you want to word it.)

2 Likes

I can see your point, but I think sometime the original ID is mistaken, like typing ‘bird’ and getting ‘bird-of-paradise-flower’, or just way off base. In a disagreement between moss and lichen there’s not really a compromise - an obs has to be one of either. But if someone who knows lichens knows which type of lichen it is, that should still count for finding out what type of lichen it is, provided it is not a moss. Needing someone else who knows what type of lichen it is is a bit much, considering you can find people who can tell that is is a lichen, perhaps by some trait characteristic of lichens but specific to this particular type of lichen alone.

Strictly speaking, no more than if the original conflicting ID had not been there - we all implicitly ‘don’t disagree’ to anything finer when we offer an initial broad ID - that’s how it works in general. So, a more specific ID is never actually seen as a contrast to earlier ones, but it’s not like the earlier ones explicitly support it.

What might be helpful, that I’ve seen mentioned, is an ability to explicitly disagree to finer id’s (bird, but definitely not enough info to make out if it’s a sparrow or not) even when making the first ID. At present, that’s not behaviour that’s supported anywhere across the site, but it would be cool to think like that about all ID’s - ‘this point on the tree but no further’, vs ‘this and all descendants’.

The CID should be bird if that’s all everybody can agree on.

But there’s another ID, observation ID, that should go all the way down to ‘sparrow’.

I think the point of the agreement process is to find the concensus ID. In the case of “bat, bird, sparrow, bird, bird” there’s no concensus that sparrow is right. We do agree that bird is right. That should be the ID, in my opinion.

Personally, I like your suggestion and think such a scheme would be reasonable. I also think the existing scheme is reasonable. In either case, there really is no clear consensus. I guess it just comes down to whether you prefer the IDs to be conservative or not. I lean towards making them more specific so they’re easier to find (and correct if they’re wrong). I know other folks might disagree with that, however.

1 Like

This is not prompted by the community ID, which tracks consensus, it is prompted by the observation not moving forward when there is information there to put it forward (and it otherwise would, if it were id’d as: bird, bird, sparrow).

There are two id’s on iNat - it is not the Community ID that determines which searches an obs shows up in, or that is displayed on top of it.

1 Like

We have two better ways to handle that.

when the wrong ID is withdrawn or deleted - iNat will graciously revert to what iNatters expect
Plant + sp + same sp = RG

And the Pre-Maverick project to retrieve obs where that ‘bat’ ID is blocking consensus to a finer ID.

Otherwise I am stuck with @mention the ‘wrong’ identifier (if they are still active) or pulling in a (willing) taxon specialist.
Here rpillon had to work HARD 50 IDs to convince the CID algorithm!

This is not better because the person might not withdra their ID. I think the other users should have power to move the label forward. It takes a lot of input to do so currently, and it seems that’s a duplication of effort that can be avoided.

I’m not sure looking in a separate project to resolve cases like this is the least clunky way of approaching it.