How to exclude observations with disagreeing IDs?

Is there any way to search for observations of a taxon and exclude observations where the community ID is the result of disagreeing IDs? In other words, I only want the observations where all IDs are for that taxon (or at least, no disagreements).

Example: A search for the subfamily Asilinae where the high and low ranks are set to subfamily.

This returns a mix of observations from both categories.

I only want these:

And not these:

Am I missing something obvious?

There’s the search term “num_identification_disagreements”, but it seems like that only works with the API search and not on the site directly.

Here’s the link to an API search I did restricting observations to Asilinae and with num_identification_disagreements=0 …seems to have worked.

https://jumear.github.io/stirfry/iNatAPIv1_identifications.html?verifiable=true&taxon_id=326683&lrank=subfamily&num_identification_disagreements=0&hrank=subfamily

2 Likes

Ah, I had tried num_identification_disagreements on the website with no luck, but I have never used the API search. Thanks! I wonder why num_identification_disagreements is not implemented on the website?

Using the num_identification_disagreements on the API, I get some odd behavior; with same parameters as before (subfamily low and high rank, place=United States):
https://jumear.github.io/stirfry/iNatAPIv1_identifications.html?verifiable=true&taxon_id=326683&lrank=subfamily&num_identification_disagreements=0&hrank=subfamily&place_id=1

The number of observations has almost doubled in the API search. I would expect this value to decrease as I filter out observations with disagreeing IDs, not increase!

Some of these new observations appear to be due to a bug where the community ID has not updated:

But others, I’m flabbergasted:


The community ID is species-level, so why is it not getting filtered by the high/low rank? Is this some bug with the API search?

You’re making a call to the identifications endpoint, not the observations endpoint. Each row is an ID, not an observation, so if an observation has 3 IDs of your taxon on it, it’ll add 3 results.
If you compare API calls with and without num_identification_disagreements, you can see it’s not actually doing anything. It also doesn’t work for the observations endpoint, so I don’t think there’s an easy way to do what you want.

2 Likes

i can think of a few ways to do what you’re asking for, though they all involve multiple steps. i’ll provide details on 2 possible methods below. the first method is better if you plan to do this kind of query often and need to do it mostly within the constraints of the system. the second method involves fewer steps, but it’s done partly outside of the system.

METHOD 1
(this assumes that you haven’t yet reviewed any of the observations in the set that you want to return)

  1. start with your base query: https://www.inaturalist.org/observations?hrank=subfamily&lrank=subfamily&place_id=1&subview=grid&taxon_id=326683
  2. get a list of child taxa for your taxon of interest: https://jumear.github.io/stirfry/iNatAPIv1_taxa.html?parent_id=326683&per_page=200
  3. use the above results to string together a list of taxon ids. use your base query and add ident_taxon_id=[the string of child taxa]. this will give you a list of observations that have been identified below your taxon of interest (which should only happen if there is a disagreeing id). i’m going to call this the exclusion query:
    https://www.inaturalist.org/observations?hrank=subfamily&lrank=subfamily&place_id=1&subview=grid&taxon_id=326683&ident_taxon_id=61610,639522,639525,61639,61657,61664,371727,481518,61663,205629,639868,639867,203274,577799,325259,461635,205349,557403,623325,780384,641823,603924,416297,948301,472892,366847,341672,1098278,787371,248273,568411,248254,780383,788065,877345,780187,341898,467163,925061,568875,708195,1090179,911688,1097337,741030,882756,733111,752398,468453,780256,856424,510813,854759,641816,785664,369052,248264,857572,1067810,780499,854441,855576,870501,894454,641852,545710,1095656,1097572,175613,552063,641815,641817,641818,641819,641820,641821,641822,641824,641825,641842,641843,641844,641845,641846,641847,641848,641849,641850,641851,568919,715503,248259,248260,248261,248263,248272,1009483,1009480,1009481,1015001,1067814,1067817,1067815,1067808,1067805,1067813,1067811,1067809,1067812,1067800,715518,962474,888705,1079904
  4. turn the above into an Identify screen query by adding “/Identify” just after “observations” but before “?” in the URL above: https://www.inaturalist.org/observations/identify?hrank=subfamily&lrank=subfamily&place_id=1&subview=grid&taxon_id=326683&ident_taxon_id=61610,639522,639525,61639,61657,61664,371727,481518,61663,205629,639868,639867,203274,577799,325259,461635,205349,557403,623325,780384,641823,603924,416297,948301,472892,366847,341672,1098278,787371,248273,568411,248254,780383,788065,877345,780187,341898,467163,925061,568875,708195,1090179,911688,1097337,741030,882756,733111,752398,468453,780256,856424,510813,854759,641816,785664,369052,248264,857572,1067810,780499,854441,855576,870501,894454,641852,545710,1095656,1097572,175613,552063,641815,641817,641818,641819,641820,641821,641822,641824,641825,641842,641843,641844,641845,641846,641847,641848,641849,641850,641851,568919,715503,248259,248260,248261,248263,248272,1009483,1009480,1009481,1015001,1067814,1067817,1067815,1067808,1067805,1067813,1067811,1067809,1067812,1067800,715518,962474,888705,1079904
  5. use the above Identify URL to go through and mark all of these as Reviewed
  6. since the observations returned from the exclusion query are now all reviewed, you should be able to add reviewed=false to the base query so that it will return only the observations that you still care about: https://www.inaturalist.org/observations?hrank=subfamily&lrank=subfamily&place_id=1&subview=grid&taxon_id=326683&reviewed=false

(in the future, if you want to repeat the query, you would just need to start from #5, assuming the taxonomy hasn’t changed.)

METHOD 2

  1. start with your base query: https://www.inaturalist.org/observations?hrank=subfamily&lrank=subfamily&place_id=1&subview=grid&taxon_id=326683
  2. export the results as CSV, and load those into a tool like spreadsheet, database, etc., that will allow you to compare 2 datasets.
  3. find observations at your taxon of interest but with identifications at descendant taxa. this is the exclusion set: https://jumear.github.io/stirfry/iNatAPIv1_identifications.html?observation_taxon_id=326683&observation_rank=subfamily&hrank=supertribe&place_id=1
  4. load a list of observation ids the exclusion set (#3) into the tool from #2, and filter out the records in #2 which have observation ids with matches in the exclusion set.


regarding “bugs”, i assume you’re referring to “num_identification_disagreements”. i don’t see that documented as a valid parameter for either the observation or identification endpoints. so that that doesn’t seem to work as a filter parameter doesn’t seem like a bug to me. num_identification_disagreements is a value that is returned by the observation API endpoint. so you could use the API to return all records from the base query, and then once you had all the records back you could then look in the result set for records with num_identification_disagreements=0 and programmatically exclude those.

EDIT: note that i’ve crossed out part of my earlier response above based on further discussion below.

2 Likes

Wow, thanks for the thorough explanation! I think I’ll try method 2, as I’ve already reviewed most of the observations I’m interested in “re-reviewing”.

I wonder: with such a convoluted process required to achieve this functionality by users, is it worth making a feature request? If they can implement num_identification_disagreements=0 so that it actually works, surely that would solve all my problems?

it doesn’t hurt to ask for a new filter parameter. i would just note that there is an existing parameter called “identifications” available for the observations API endpoint, whose valid options are most_agree, some_agree, and most_disagree. so you might be able to leverage that existing parameter and ask for an additional all_agree option. (you’d also have to ask to allow the observations and identify screens have access to that parameter, too, i think.)

the process is complex because what you’re asking for is complex. (just look at how many words and photos you needed to describe what you were looking for in the original post.) i suspect the reason that they don’t make it too easy to query for what you’re asking for is because such a query would take up a lot of processing power (relative to simpler queries). but maybe the system is a lot more powerful or optimized these days, and it might not be as bad to make this kind of filter available to the masses…

EDIT: note that i’ve crossed out part of my earlier response based on further discussion below. i still don’t think it would hurt to ask for a new parameter to accomplish the kind of query that Myelaphus is asking for, but i think it would have to be based on something other than the existing num_identification_disagreement and identifications (most_agree, some_agree, most_disagree) structures, since as they appear to work now, they would not be useful in this context.

1 Like

@pisum what if you added another column to your observations tool for num_identification_disagreements? Then I think a single call to https://api.inaturalist.org/v1/observations?exact_taxon_id=326683&place_id=1 would be sufficient – just look for only the observations with 0 in the disagreements column.

1 Like

It does add a small amount of clutter, and maybe you don’t want to add something every time someone has a new request, but it’s doable.

hmmm… let me go for a run and have some dinner, and i’ll think about the best approach. i’m thinking number of active IDs would not be a terrible thing to have, but then the question is how to best represent agreements (if at all).

maybe something like:

(num_identification_disagreements===0)?"all agree"
:identifications_most_agree?"most agree"
:identifications_some_agree?"some agree"
:identifications_most_disagree?"most disagree"
:""
or "id agreement level":
(num_identification_disagreements===0)?"all"
:(num_identification_agreements===0)?"none"
:identifications_most_agree?"most"
:identifications_some_agree?"some"
:identifications_most_disagree?"few"
:""

EDIT: scratch the above. i don’t think the num_identification_disagreements value is actually set like i would expect it to be set (ex. it’s 0 in https://api.inaturalist.org/v1/observations?id=54972306, even though there are disagreements). so it would be a more complicated process to actually go through and compare the identifications for each observation. jwidness – do you want to write something for that? it probably wouldn’t be terribly difficult, but i’m not inspired enough right now to go through the trouble… i can merge in your changes when you’re done if you want to update the page you referenced above.

i looked at the above issue related to num_identification_disagreements with fresh eyes this morning, and here’s what i see when i click on “About” in the community taxon summary for the observation mentioned above (https://www.inaturalist.org/observations/54972306):

note that at the subfamily level there are no disagreements because all the IDs are for Asilinae. in other words, even though the IDs disagree with each other, they don’t disagree with the community taxon, and so num_identification_disagreements probably represents disagreements with the community taxon. (that’s why it’s zero.)

i think the most_agree, some_agree, etc. also operate the same way. so none of these seem very useful in the context of what you’re asking for.

it still would be possible to loop through the active identifications, counting (distinct) taxa that, compared to the observation taxon, are:

  • (the same as) the observation taxon
  • ancestor taxa
  • descendant taxa
  • other taxa

… and some sort of metric based on that that might be the most informative in this context. i’m still trying to figure out a way to represent that that ideally does not take up 4 columns though. i also don’t want to get too far away from the basic information that the API response provides, since https://jumear.github.io/stirfry/iNatAPIv1_observations.html is supposed to be a simple page that reformats the endpoint response to be more human-readable, not a page that does a lot of extra stuff on top of the response.

so if anyone has any suggestions for best approach, feel free to share.

UPDATE:

no one offered any suggestions, and i couldn’t think of a better way. so I went ahead and updated the page noted above to add some extra columns for ID count and (count of) ID taxa in various categories. so https://jumear.github.io/stirfry/iNatAPIv1_observations.html?exact_taxon_id=326683&place_id=1&per_page=200 should pull back all US Asilinae, and then to exclude observations whose IDs don’t agree, you can skip past the ones that have values > 0 in either the “id taxa @ desc”(endant) or “id taxa @ other” columns.

so, for example, here’s what i see in that page for US Asilinae right now:

note that observation 55154044 (https://www.inaturalist.org/observations/55154044) is shown as having 2 identification taxa that are descendants of the observation taxon. since there are also 0 ID taxa at the observation taxon, you can interpret that as meaning there are lower-level IDs that disagree with each other, and that’s in fact what we see in the observation itself:

hope that all makes sense… if else has any additional thoughts or suggestions, let me know.

2 Likes

That seems to get the job done. Thanks so much!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.