Community taxon algorithm tweaks

If it’s a straight fight between a heavily misidentified species A (e.g. Red Winged Black bird) and an expert who correctly knows that it is in fact species B (e.g. sparrow), then I don’t see that these changes would have any effect, positive or negative.

On the assumption that observations come into contact with more specific experts as the CID reaches finer ranks (which is what is currently inhibited by Mavericks) I can’t think of a use case where the ‘correct’ answer would be disadvantaged.

But that may be a failure of my imagination as I quickly dip in and out of this thread in my coffee break :)

I guess the suggestion (however exactly it is composed) makes the CID a bit more agile, so that it doesn’t get stuck at higher ranks, needing increasingly massive numbers of IDs just to shift it from, say, class to order.

1 Like

Yes, this is a good question!
This echoes @bdagley´s comment above too.


Let’s take Diptera as an example.
There are about 25000 maverick Diptera IDs out of 2000000 or so obs (1.25%) :
Taking a random page :
https://www.inaturalist.org/identifications?category=maverick&taxon_id=47822&page=50

You can see the following :
29 out of 50 observations are Needs ID.
21 out of 50 observations are RG.

As far as I can see, all Needs ID observations appear inhibited by the use-case I mention and would benefit from change. None of the observations appear as if they would be negatively impacted by the change I am suggesting. None of the observations with maverick IDs appear impacted by the use-case you mentioned with regard to the Blackbird.

So weighing up the two use-cases in maverick Diptera we might have something like :

58% visibly impacted by my use-case
42% neutral
0% visibly impacted by your use-case


BUT
Checking bird maverick IDs, its clearly a very different state of affairs.
There are about 170000 maverick bird IDs out of 12700000 obs so 1.3% maverick.
Taking a random page:
https://www.inaturalist.org/identifications?category=maverick&taxon_id=3&page=50

Only 3 of the 50 on this page are Needs ID.

None of the 47 RG IDs would appear to be negatively impacted by the change I suggest, because :

  • the maverick ID is up against 3 x species level IDs so already powerless
  • the majority of the disputed IDs are at genus level so don’t suffer from being trapped in the tree
  • the vast majority of the disputed IDs appear to be incorrect initial IDs since corrected by community

In the three Needs ID observations that are present, we have

  • 2 x affected by my initial use-case
  • 1 x affected by your blackbird use-case

In summary, for maverick Bird IDs, there might be something like :

4% impacted by my use-case
96% neutral
2% impacted by your use-case


1 Like

So, I agree that if you only wish to contribute to bird IDs and observations, then the choice between use-cases might be somewhat arbitrary. But if you wish to contribute to Diptera ID, this does not seem to be the case.

I assume similar issues across most invertebrates …and imagine a broad correlation with this issue and your % of Needs ID per iconic taxon.. Maybe that’s a broad brushstroke …but doubtless there is some sort of spectrum. Birds and Diptera are likely two extremes.

I disagree. Expertise in invertebrates is hard to come by and resolving IDs in these taxa is often significantly more difficult as a result. This is not the case in birds. We do not need to fight to retain or gain expertise in birds in the same way.

I think we need to try our utmost to create a welcoming space for expertise in lesser observed and more complex taxa. To do that we do need to offset existing taxonomic bias in the system where possible. This issue with the algorithm warrants fixing in that respect, as it seems to be significantly weighted against more complex taxa.

5 Likes

i’m not going to attempt a full-scale analysis at the moment, but as a quick sanity check, i just looked the first 3 needs ID items from the page you referenced, and i don’t see that any of these are currently “inhibited”.

at best, these are “neutral”, as they currently exist. so i’m not sure can agree with your comment in bold (and it kind of makes me question some of the other conclusions you based on this).

i’m going to have to think of a way to actually analyze this effectively, hopefully without having to manually look at the details of each one. i’m not sure if there’s even an effective automated way to differentiate a case where an observation is being inhibited from reaching research grade vs an observation that would be inhibited from being pushed out of research grade. if you looked at the raw vote counts and taxon levels, they would look like exactly the same thing, except for possibly where where the outliers are earlier in the chain of identifications vs later or possibly by looking for trusted identifiers.

for anyone wanting to look at things manually for now, if you want to start with mavericks as a starting point for analysis, i would recommend hitting using the API’s /identifications route rather than using the identifications page, since you can filter for things like quality grade (and get some other useful info). ex. https://jumear.github.io/stirfry/iNatAPIv1_identifications.html?quality_grade=needs_id&category=maverick&taxon_id=47822.

i didn’t necessarily want to turn this into a battle over which taxa are better or which are less represented, etc. i chose blackbirds because that’s the quickest example that i could think of to demonstrate a counterexample to your case. if you want to generalize 2 two opposite cases, it would be observations being inhibited from reaching research grade vs observations that would be inhibited from being pushed out of research grade.

and remember that this is just part of the entire pro vs con analysis. (for example, it’s quite possible the scale of the changes required is a dealbreaker anyway, rendering this whole conversation moot.)

i also still think it’s worth understanding how you are thinking about the couple of “variations” i mentioned earlier. please comment:

1 Like

I mean this comes down to semantics… but no, not in my book.
All the ones you mention are Needs ID with a maverick ID so are inhibited from moving to a lower level even if expertise adds a finer ID. That’s the crux of this thread…
…and the point of that broader comment was to weigh up your bird use-case against my original sawflies use-case. In that context, these observations are not neutral - they are all affected by the issue I mention to some extent…but very unlikely to be affected by your use-case as far as I can see.

I’m not sure what workflow looks like in the taxa where you are active as an identifier(?), but in Diptera, three people might overcome a maverick autosuggest to take an observation to family, but it might not be until an expert in that family comes along that it will become “actively inhibited”. That doesn’t make the issue with the algorithm “neutral” in the mean-time.

But yes, sure… I could add more granularity if you like.
I can split the %s into something like “potentially limited” and “actively inhibited”, as well as “neutral” if you like.

Pulling out obscure use-cases is pointless imo. I’m not sure I expect any algorithm to successfully cover all bases…but for me your example is a million miles from the absurdity of what I see happening in examples like the one I originally posted. In your use-case, whether it sits at section or species is a minor issue imo and not something I can imagine stumbling across more than once in a blue moon.

The use-case you mention with the blackbird is far more relevant and interesting imo. This is actively visible in some obs from the mavericks I checked - we previously discussed a similar issue earlier in the thread. This would be a valid use-case to weigh against imo - but from what I can see, as mentioned, it’s still extremely rare in comparison with the issue I am talking about.

Research Grade is not my goal.
I’m not sure where you get this from.

RG is not a factor at all in consideration of the problem here as far as I can see.

This is about observations reaching their optimal level possible, whatever rank that might be, without unnecessary barriers to expert input. That rank might be RG, it might not be.

1 Like

if you don’t care about RG, then i’m not sure why any of this matters. nothing prevents experts from adding IDs as they see fit. if the concern is that they won’t be able to find a particular taxon because they can’t filter by observation taxon, then i would think the more direct way to address that problem is to educate or update the filter UI to allow folks to use the ident_taxon_id filter more easily. if the concern is that their IDs are being overridden by other IDs, then welcome to the harsh reality of community ID. if it’s your observation, you can always opt out of community ID, and if it’s not your ID, then you have things like projects where you can curate your IDs (and you can export observations with a field indicating the taxa IDed last by one of the project curators). if you want to be able to search for taxa based on certain IDs, there are ways to search for IDs, and there are also existing feature requests to be able to search for observations by a particular user’s IDs taking priority.

there are so many ways to address problems other than the proposal being discussed in this thread which will have more clear benefits across the board…

so i’ll leave the thread with some data that i gathered. i pulled a random set of all needs ID observations (n=2000) based on https://jumear.github.io/stirfry/iNatAPIv1_observations.html?quality_grade=needs_id&per_page=200&page=50&order_by=random&options=idextra, going from pages 41 to 50, and i pulled a similar set of needs ID observations where any of the IDs were Diptera based on https://jumear.github.io/stirfry/iNatAPIv1_observations.html?quality_grade=needs_id&per_page=200&page=50&order_by=random&ident_taxon_id=47822&options=idextra

assuming mavericks are at the crux of the issue, and IDs at a descendant taxon to the observation taxon are also needed to potentially constitute a case where taxon refinement was inhibited, then i see roughly 0.6% of all needs ID observations being affected by this issue, and roughly 1.3% of needs ID observations which have at least one Diptera ID.

you’re welcome to look through the stuff below, but if we don’t care about RG, then frankly, i’m not seeing a lot of stuff where it looks to me like the observations are being inhibited or that an identifier’s work is for naught. leave the community algorithm alone, and focus on other ways to help and recruit identifiers.

the All set:

  • n=2000 (out of 34,513,125 needs ID records)
  • 21 of 2000 (1.1%) included a maverick ID (“ID Taxa @ Other” in the details below)
  • 11 of 21 (52%) had IDs which were descendants of the observation ID
  • records including a maverick:
Obs ID Taxon Common Rank Grade ID Count ID Count @ Obs ID Taxa @ Obs ID Taxa @ Ansc ID Taxa @ Desc ID Taxa @ Other Obs Taxon = Community Taxon Obs Date Sub Date
21377307 Baeolophus Titmice genus needs_id 5 4 1 0 0 1 TRUE 2019-03-18 15:39:18 (-05:00) 2019-03-18 22:08:49 (-05:00)
24703707 Tettigoniidae Katydids family needs_id 5 3 1 0 1 1 TRUE 2019-05-06 17:53:57 (-07:00) 2019-05-06 17:59:07 (-07:00)
102202286 Gastropoda Gastropods class needs_id 5 3 1 0 1 1 TRUE 2021-11-28 14:52:00 (-05:00) 2021-11-29 17:55:35 (-05:00)
38810724 Pseudacris Chorus Frogs genus needs_id 6 3 1 0 1 1 TRUE 2020-02-18 14:22:52 (±00:00) 2020-02-18 22:25:19 (±00:00)
73031816 Croton Crotons genus needs_id 4 3 1 0 0 1 TRUE 2021-04-04 18:21:43 (-05:00) 2021-04-05 13:13:37 (-05:00)
43830560 Gynoxys genus needs_id 4 3 1 0 0 1 TRUE 2020-02-27 11:56:00 (-05:00) 2020-04-26 23:27:32 (-05:00)
39279835 Arecaceae palms family needs_id 4 3 1 0 0 1 TRUE 2020-02-27 10:23:37 (-05:00) 2020-02-27 10:32:06 (-05:00)
22399503 Echium Viper’s-buglosses genus needs_id 4 3 1 0 0 1 TRUE 2019-04-12 15:17:24 (-07:00) 2019-04-12 21:41:04 (-07:00)
64577543 Coleoptera Beetles order needs_id 4 3 1 0 0 1 TRUE 2020-11-04 09:30:00 (-05:00) 2020-11-09 23:40:02 (-05:00)
13324353 Ichneumonidae Ichneumonid Wasps family needs_id 4 3 1 0 0 1 TRUE 6/8/2018 2018-06-11 04:10:11 (±00:00)
45146223 Portuninae subfamily needs_id 4 3 1 0 0 1 TRUE 2020-05-07 00:19:19 (-07:00) 2020-05-07 00:19:38 (-07:00)
97039258 Anthracinae subfamily needs_id 5 2 1 1 1 1 TRUE 2021-10-03 10:22:56 (-07:00) 2021-10-03 10:23:52 (-07:00)
3732667 Digrammia genus needs_id 4 2 1 0 1 1 TRUE 2016-07-22 20:55:00 (-07:00) 2016-07-23 14:33:03 (-07:00)
10751406 Scolopendromorpha Tropical Centipedes order needs_id 4 2 1 0 1 1 TRUE 2018-03-09 14:28:57 (-06:00) 2018-04-10 09:26:35 (-05:00)
6966725 Syrphini tribe needs_id 4 2 1 0 1 1 TRUE 2017-07-08 10:37:41 (-04:00) 2017-07-08 10:40:26 (-04:00)
25617974 Vespula Ground Yellowjackets genus needs_id 4 2 1 0 1 1 TRUE 2019-05-03 16:43:00 (+02:00) 2019-05-23 17:00:53 (+02:00)
70481743 Plegadis Plegadis Ibises genus needs_id 4 1 1 0 1 1 TRUE 2021-01-22 16:28:00 (-06:00) 2021-03-02 21:30:57 (-06:00)
90050263 Coleoptera Beetles order needs_id 4 1 1 0 1 1 TRUE 2021-08-05 11:30:21 (-05:00) 2021-08-05 12:33:58 (-05:00)
59606737 Micropezidae Stilt-legged Flies family needs_id 2 1 1 0 0 1 FALSE 2020-09-14 14:32:21 (-04:00) 2020-09-14 15:40:21 (-04:00)
3757424 Neotamias umbrinus Uinta Chipmunk species needs_id 2 1 1 0 0 1 FALSE 2016-07-26 09:40:00 (-06:00) 2016-07-27 20:42:29 (-06:00)
1395222 Scaphiopus Southern Spadefoot Toads genus needs_id 5 0 0 0 2 1 TRUE 2015-04-14 21:14:48 (-05:00) 2015-04-15 00:25:13 (-05:00)

the Diptera set:

  • n=2000 (out of 1,350,568 needs ID records)
  • 48 of 2000 (2.4%) included a maverick ID (“ID Taxa @ Other” in the details below)
  • 26 of 48 (54%) had IDs which were descendants of the observation ID
    records including a maverick:
Obs ID Taxon Common Rank Grade ID Count ID Count @ Obs ID Taxa @ Obs ID Taxa @ Ansc ID Taxa @ Desc ID Taxa @ Other Obs Taxon = Community Taxon Obs Date Sub Date
5769839 Chrysididae Cuckoo Wasps family needs_id 8 5 1 0 1 1 TRUE 2017-04-16 18:34:10 (-04:00) 2017-04-16 18:34:54 (-04:00)
22990893 Ptecticus genus needs_id 7 2 1 1 1 1 TRUE 2019-04-22 12:34:00 (+10:00) 2019-04-24 21:12:22 (+10:00)
15796766 Milesiini tribe needs_id 6 1 1 1 1 1 TRUE 2018-08-21 22:46:40 (-04:00) 2018-08-22 19:20:43 (-04:00)
11843368 Agapostemon subgenus needs_id 6 1 1 1 1 1 TRUE 2018-04-27 10:37:00 (-07:00) 2018-04-29 20:51:14 (-07:00)
89461525 Syrphidae Hover Flies family needs_id 6 3 1 0 1 1 TRUE 2021-07-31 12:10:36 (-03:00) 2021-08-01 09:02:04 (-03:00)
15221997 Eristalis Drone Flies genus needs_id 6 4 1 0 1 1 TRUE 2018-08-08 10:32:45 (+02:00) 2018-08-08 12:33:16 (+02:00)
46145706 Sepsidae Black Scavenger Flies family needs_id 6 4 1 1 0 1 TRUE 2020-05-09 17:51:43 (-07:00) 2020-05-16 14:26:17 (-07:00)
12440678 Diptera Flies order needs_id 5 2 1 0 2 1 TRUE 2018-05-07 15:55:47 (+07:00) 2018-05-14 11:11:46 (+07:00)
46469121 Apoidea Bees and Apoid Wasps superfamily needs_id 5 2 1 0 2 1 TRUE 2020-05-19 11:38:39 (+02:00) 2020-05-19 11:38:53 (+02:00)
84868384 Crabronidae Square-headed Wasps, Sand Wasps, and Allies family needs_id 5 2 1 0 2 1 TRUE 2021-06-28 12:58:37 (-04:00) 2021-06-28 13:00:11 (-04:00)
69434955 Brachycera Brachyceran Flies suborder needs_id 5 2 1 1 1 1 TRUE 2021-02-11 08:15:00 (-05:00) 2021-02-11 19:32:49 (-05:00)
81901787 Sesiidae Clearwing Moths family needs_id 5 3 1 0 1 1 TRUE 2021-06-06 12:16:43 (-06:00) 2021-06-06 12:17:33 (-06:00)
107131624 Asilidae Robber Flies family needs_id 5 3 1 0 1 1 TRUE 2022-02-09 16:20:00 (+02:00) 2022-02-20 22:22:38 (+02:00)
23740884 Hymenoptera Ants, Bees, Wasps, and Sawflies order needs_id 5 2 1 0 1 1 TRUE 2019-04-28 14:18:48 (-07:00) 2019-04-28 14:20:44 (-07:00)
42515086 Cerambycidae Longhorn Beetles family needs_id 5 3 1 0 1 1 TRUE 2020-04-18 13:55:12 (-07:00) 2020-04-18 14:52:10 (-07:00)
75588288 Anthophila Bees epifamily needs_id 5 3 1 0 1 1 TRUE 2021-04-28 17:35:02 (-04:00) 2021-04-28 17:38:59 (-04:00)
44644576 Syrphidae Hover Flies family needs_id 5 2 1 0 1 1 TRUE 2020-05-02 14:40:00 (-04:00) 2020-05-02 17:15:03 (-04:00)
56510559 Syrphidae Hover Flies family needs_id 5 3 1 0 1 1 TRUE 2020-08-15 09:42:48 (-04:00) 2020-08-15 09:45:13 (-04:00)
26254986 Elateridae Click Beetles family needs_id 5 3 1 0 1 1 TRUE 2019-06-02 11:26:36 (+02:00) 2019-06-02 17:50:11 (+02:00)
50899373 Coleoptera Beetles order needs_id 5 3 1 0 1 1 TRUE 2020-06-14 16:07:41 (+03:00) 2020-06-25 20:00:08 (+03:00)
32210438 Choerades genus needs_id 5 3 1 1 0 1 TRUE 2019-09-06 14:29:55 (±00:00) 2019-09-06 12:31:33 (±00:00)
53274930 Cuterebra Glire Bot Flies genus needs_id 5 3 1 1 0 1 TRUE 2020-07-07 19:21:22 (-07:00) 2020-07-16 10:13:19 (-07:00)
97332930 Velia genus needs_id 5 3 1 1 0 1 TRUE 2021-10-06 12:19:23 (+03:00) 2021-10-06 14:19:53 (+03:00)
51657766 Diogmites Hanging-thieves genus needs_id 5 4 1 0 0 1 TRUE 2020-07-01 13:35:13 (±00:00) 2020-07-02 02:59:46 (±00:00)
84151464 Syrphidae Hover Flies family needs_id 4 2 1 0 1 1 TRUE 2021-06-23 09:09:49 (+02:00) 2021-06-23 09:10:36 (+02:00)
104152983 Lepidoptera Butterflies and Moths order needs_id 4 1 1 0 1 1 TRUE 2021-12-31 16:27:00 (+11:00) 2022-01-02 13:38:21 (+11:00)
59455355 Diptera Flies order needs_id 4 1 1 0 1 1 TRUE 2020-09-13 10:50:05 (±00:00) 2020-09-13 15:50:59 (±00:00)
13018728 Sialidae Modern and Ancestral Alderflies family needs_id 4 2 1 0 1 1 TRUE 2018-06-01 15:01:00 (-05:00) 2018-06-01 15:02:10 (-05:00)
31934957 Diptera Flies order needs_id 4 1 1 0 1 1 TRUE 2019-08-06 16:20:48 (-05:00) 2019-09-01 13:56:11 (-05:00)
45682525 Vespidae Hornets, Paper Wasps, Potter Wasps, and Allies family needs_id 4 2 1 0 1 1 TRUE 2020-05-12 16:40:57 (+02:00) 2020-05-12 16:41:06 (+02:00)
24094825 Bacchini tribe needs_id 4 1 1 0 1 1 TRUE 2019-04-29 18:20:41 (-07:00) 2019-04-29 18:21:06 (-07:00)
30436218 Opomyza genus needs_id 4 3 1 0 0 1 TRUE 8/8/2019 2019-08-08 18:00:57 (±00:00)
79136611 Psychodidae Moth Flies and Sand Flies family needs_id 4 3 1 0 0 1 TRUE 2021-05-15 12:26:00 (±00:00) 2021-05-17 11:45:33 (±00:00)
16929391 Plantae plants kingdom needs_id 4 3 1 0 0 1 TRUE 2018-09-25 12:30:27 (-04:00) 2018-09-26 08:52:06 (-04:00)
26815207 Coleoptera Beetles order needs_id 4 3 1 0 0 1 TRUE 2019-06-11 13:28:42 (-04:00) 2019-06-11 13:43:26 (-04:00)
40303466 Syrphinae Typical Hover Flies subfamily needs_id 4 3 1 0 0 1 TRUE 8/20/2016 2020-03-20 18:22:28 (±00:00)
27979589 Sarcophagidae Flesh Flies family needs_id 4 3 1 0 0 1 TRUE 2019-06-28 17:36:00 (+02:00) 2019-07-01 06:09:29 (+02:00)
50224862 Chironomidae Non-biting Midges family needs_id 4 3 1 0 0 1 TRUE 2020-06-19 23:30:57 (-04:00) 2020-06-19 23:31:15 (-04:00)
82109181 Chironomidae Non-biting Midges family needs_id 4 3 1 0 0 1 TRUE 2021-05-31 11:28:13 (±00:00) 2021-06-08 02:22:31 (±00:00)
37702544 Comptosia genus needs_id 4 3 1 0 0 1 TRUE 2020-01-18 18:04:24 (+11:00) 2020-01-18 18:18:03 (+11:00)
24701595 Ephemeroptera Mayflies order needs_id 4 3 1 0 0 1 TRUE 2019-05-06 11:44:10 (-07:00) 2019-05-06 17:16:35 (-07:00)
17959215 Miridae Plant Bugs family needs_id 4 3 1 0 0 1 TRUE 2018-10-30 08:56:24 (-05:00) 2018-10-30 08:56:36 (-05:00)
12519002 Tipulomorpha Crane Flies infraorder needs_id 4 3 1 0 0 1 TRUE 2018-05-16 20:09:08 (+02:00) 2018-05-16 20:09:28 (+02:00)
14566460 Sarcophagidae Flesh Flies family needs_id 4 3 1 0 0 1 TRUE 2018-07-21 10:46:28 (+02:00) 2018-07-21 12:35:19 (+02:00)
59308587 Stratiomyidae Soldier Flies family needs_id 4 3 1 0 0 1 TRUE 2020-09-12 12:17:30 (+02:00) 2020-09-12 14:28:42 (+02:00)
43218230 Aphididae Aphids family needs_id 4 3 1 0 0 1 TRUE 2020-04-24 12:20:13 (-07:00) 2020-04-24 17:24:08 (-07:00)
22768309 Bombyliinae subfamily needs_id 4 3 1 0 0 1 TRUE 2019-04-19 15:52:00 (-04:00) 2019-04-20 14:24:56 (-04:00)
80672236 Aphididae Aphids family needs_id 4 3 1 0 0 1 TRUE 2021-05-27 17:44:29 (±00:00) 2021-05-28 23:49:18 (±00:00)
3 Likes

Digging deeper into the numbers @pisum presents, to try and quantify more fully :


Lets say we have a maverick observation with

1 x bee ID
3 x fly IDs

This type of example is not included in @pisum´s stats above, as they only include those with an existing ID at a descendant taxon). However, it is clearly inhibited by the issue this thread discusses, as any image of a bee mimic will be able to go lower than order once the right set of eyes adds an ID.

I accept that taking a total of all Needs ID maverick observations as I did in my last post is equally problematic as some have already been resolved to an optimum rank.

So for compromise I suggest to use a figure somewhere between the 0.6% and 1.1% - say 0.85% of Needs ID observations affected.

In total this would give us something like:
0.85% x 34000000 Needs ID = 289000 observations impacted


BUT crucially,

Counting numbers of observations alone does not equate to number of IDs required.
Which is the fundamental point of the issue being raised about the algorithm.
It´s about the amount of identifier power required to place an observation to optimal rank.

The discrepancy here could be seen as a 3 to 1 ratio, as to shift rank with a maverick in play it requires 3 times the number of identifiers.

So for example,

Two rank changes would take:

Current algorithm
6 x 289000 = 1.7 million IDs required
vs
Corrected algorithm
2 x 289000 = 580000 IDs required

In total, this is 1.7 million - 580000 = about 1.1 million IDs extra

One rank shift would take

Current algorithm
3 x 289000 = 867000 IDs required
vs
Corrected algorithm
1 x 289000 = 289000 IDs required

In total, this is 867000 - 289000 = 578000 IDs extra

So in terms of impact of the bug, we might say 578000 - 1120000 excess/dummy IDs are required by leaving this aspect of the algorithm in play

Again taking the median here … let’s guesstimate about

850000 IDs extra in total

If one does 300 IDs an hour, this would take 2840 hrs.
Equivalent of about:

1.5 years worth of working days

If there is a cost benefit, I would like to see it, but I don’t see a significant one in the counter examples provided thus far which would offset this.

Again, few of these trapped obs are N.American birds. They will be heavily weighted to more complex, less observed taxa, less covered geographies, where we have less identifiers active. So whilst nearly a million IDs might be no big deal for N.American bird identifiers… in European inverts this is a good chunk of resources which would simply be better applied to other observations.

Moreover, we are usually lucky to have 1 specialist in an invert taxon, let alone 3! … so with the current algorithm many of these observations simply won’t resolve to rank for many many years, if ever (without blind agreement).

1 Like

i just wanted to come back for a moment to share a Power Automate Desktop flow that can be used to get this random set of observations that i noted before. as defined, the flow uses Edge as the browser, but it can be adapted for Firefox or Chrome. it can also be adapted to use different filter parameters, request different numbers of records/pages, or get data from other jumear…/stirfry/iNatv1API_xxx.html pages. the flow dumps the data into Excel, but the flow can be adapted to export to CSV or some other format.

to use the flow, simply copy the code below and paste it into a new flow in Power Automate Desktop.

# This flow contains the basic structure to extract data from most /stirfry/iNatAPIv1_xxx.html pages and then open the data in an Excel spreadsheet. In an ideal world, a data extraction flow would just need 2 steps -- one to open the browser, and another to extract the data, handle pagination, and export to Excel. However, the /stirfry pages load a basic skeleton first and then add data based on the response from an API request. This delay between initial load and API response can cause issues issues for the the standard data extraction step, since there's no mechanism to force it to wait for the API request to complete. So this flow's structure allows for such a wait, in part, by handling pagination and data export separately from the data extraction action.
SET urlBase TO $'''https://jumear.github.io/stirfry/iNatAPIv1_observations.html?quality_grade=needs_id&options=idextra&order_by=random'''
SET pageFirst TO 1
SET pageLast TO 10
SET perPage TO 200
SET delayBeforeExtract TO 1
SET delayMaxRetry TO 20
# dataExtracted is a data table variable that will store the combined results extracted from each page. It is initialized first with column header labels. These labels need to be set to match the fields defined in the main data extraction step.
SET dataExtracted TO { ^['Obs ID', 'Obs URL', 'Taxon', 'Common', 'Rank', 'Grade', 'ID Count', 'ID Count @ Obs', 'ID Taxa @ Obs', 'ID Taxa @ Ansc', 'ID Taxa @ Desc', 'ID Taxa @ Other', 'Obs Taxon = Community Taxon', 'Obs Date', 'Sub Date'] }
Variables.CreateNewList List=> pagesNotExtracted
WebAutomation.LaunchEdge.LaunchEdge Url: $'''%urlBase%&per_page=%perPage%&page=%pageFirst%''' WindowState: WebAutomation.BrowserWindowState.Normal ClearCache: False ClearCookies: False Timeout: 60 BrowserInstance=> Browser
LOOP pageCurr FROM pageFirst TO pageLast STEP 1
    LOOP LoopIndex FROM 1 TO delayMaxRetry STEP 1
        WAIT delayBeforeExtract
        # When the page gets a response from the API, it will add a paragraph <p> to the body of the page which displays either error messages returned from the API or some summary information about the data returned. So if this <p> is found, data extraction can begin. Otherwise, wait again before retrying (up to the maximum number of retries).
        IF (WebAutomation.IfWebPageContains.WebPageContainsElement BrowserInstance: Browser Control: appmask['WebPage']['pInfo']) THEN
            # This is the main data extraction step. Note that it is not set up to handle pagination, since pagination is handled by the rest of the flow.
            WebAutomation.ExtractData.ExtractTable BrowserInstance: Browser Control: $'''html > body > table > tbody > tr''' ExtractionParameters: {[$'''td:eq(1) > a''', $'''Own Text''', $'''%''%''', $'''Value #1'''], [$'''td:eq(1) > a''', $'''Href''', $'''%''%''', $'''Value #2'''], [$'''td:eq(3) > a''', $'''Own Text''', $'''%''%''', $'''Value #3'''], [$'''td:eq(4)''', $'''Own Text''', $'''%''%''', $'''Value #4'''], [$'''td:eq(5)''', $'''Own Text''', $'''%''%''', $'''Value #5'''], [$'''td:eq(6)''', $'''Own Text''', $'''%''%''', $'''Value #6'''], [$'''td:eq(7)''', $'''Own Text''', $'''%''%''', $'''Value #7'''], [$'''td:eq(8)''', $'''Own Text''', $'''%''%''', $'''Value #8'''], [$'''td:eq(9)''', $'''Own Text''', $'''%''%''', $'''Value #9'''], [$'''td:eq(10)''', $'''Own Text''', $'''%''%''', $'''Value #10'''], [$'''td:eq(11)''', $'''Own Text''', $'''%''%''', $'''Value #11'''], [$'''td:eq(12)''', $'''Own Text''', $'''%''%''', $'''Value #12'''], [$'''td:eq(13)''', $'''Own Text''', $'''%''%''', $'''Value #13'''], [$'''td:eq(16)''', $'''Own Text''', $'''%''%''', $'''Value #14'''], [$'''td:eq(17)''', $'''Own Text''', $'''%''%''', $'''Value #15'''] } ExtractedData=> DataFromWebPage
            IF DataFromWebPage.RowsCount = 0 THEN
                Variables.AddItemToList Item: pageCurr List: pagesNotExtracted NewList=> pagesNotExtracted
            ELSE
                LOOP FOREACH CurrentItem IN DataFromWebPage
                    SET dataExtracted TO dataExtracted + CurrentItem
                END
            END
            EXIT LOOP
        ELSE IF LoopIndex = delayMaxRetry THEN
            Variables.AddItemToList Item: pageCurr List: pagesNotExtracted NewList=> pagesNotExtracted
        END
    END
    IF (WebAutomation.IfWebPageContains.WebPageDoesNotContainElement BrowserInstance: Browser Control: appmask['WebPage']['nextPage']) THEN
        EXIT LOOP
    END
    WebAutomation.Click.Click BrowserInstance: Browser Control: appmask['WebPage']['nextPage']
END
IF dataExtracted.RowsCount > 0 THEN
    Excel.LaunchExcel.LaunchUnderExistingProcess Visible: True Instance=> ExcelInstance
    Excel.WriteToExcel.WriteCell Instance: ExcelInstance Value: dataExtracted.ColumnHeadersRow Column: $'''A''' Row: 1
    Excel.WriteToExcel.WriteCell Instance: ExcelInstance Value: dataExtracted Column: $'''A''' Row: 2
END
IF pagesNotExtracted.Count > 0 THEN
    Text.JoinText.JoinWithCustomDelimiter List: pagesNotExtracted CustomDelimiter: $''', ''' Result=> pagesNotExtracted_CommaSeparated
    Display.ShowMessageDialog.ShowMessage Title: $'''Extraction Issues''' Message: $'''No data extracted from these page numbers: %pagesNotExtracted_CommaSeparated%

Check the browser window to see if data exists or error messages were returned from the API.''' Icon: Display.Icon.ErrorIcon Buttons: Display.Buttons.OK DefaultButton: Display.DefaultButton.Button1 IsTopMost: True
END

# [ControlRepository][PowerAutomateDesktop]
{
  "ApplicationInfo": {
    "Name": "ClipboardControlRepository",
    "Version": "1.0"
  },
  "Screens": [
    {
      "Controls": [
        {
          "AutomationProtocol": "uia3",
          "ScreenShot": null,
          "ElementTypeName": "<p>",
          "Name": "pInfo",
          "SelectorCount": 1,
          "Selectors": [
            {
              "CustomSelector": " > body > p:eq(1)",
              "Elements": [
                {
                  "Attributes": [
                    {
                      "Ignore": true,
                      "IsOrdinal": false,
                      "Name": "Class",
                      "Operation": "EqualTo",
                      "Value": null
                    },
                    {
                      "Ignore": true,
                      "IsOrdinal": false,
                      "Name": "Id",
                      "Operation": "EqualTo",
                      "Value": null
                    },
                    {
                      "Ignore": true,
                      "IsOrdinal": true,
                      "Name": "Ordinal",
                      "Operation": "EqualTo",
                      "Value": "-1"
                    },
                    {
                      "Ignore": true,
                      "IsOrdinal": false,
                      "Name": "Title",
                      "Operation": "EqualTo",
                      "Value": null
                    }
                  ],
                  "CustomValue": null,
                  "Ignore": false,
                  "Name": "<body>",
                  "Tag": "body"
                },
                {
                  "Attributes": [
                    {
                      "Ignore": true,
                      "IsOrdinal": false,
                      "Name": "Class",
                      "Operation": "EqualTo",
                      "Value": null
                    },
                    {
                      "Ignore": true,
                      "IsOrdinal": false,
                      "Name": "Id",
                      "Operation": "EqualTo",
                      "Value": null
                    },
                    {
                      "Ignore": false,
                      "IsOrdinal": true,
                      "Name": "Ordinal",
                      "Operation": "EqualTo",
                      "Value": "1"
                    },
                    {
                      "Ignore": true,
                      "IsOrdinal": false,
                      "Name": "Title",
                      "Operation": "EqualTo",
                      "Value": null
                    }
                  ],
                  "CustomValue": null,
                  "Ignore": false,
                  "Name": "<p>",
                  "Tag": "p"
                }
              ],
              "Ignore": false,
              "IsCustom": true,
              "IsWindowsInstance": false,
              "Order": 0
            }
          ],
          "Tag": "p"
        },
        {
          "AutomationProtocol": null,
          "ScreenShot": null,
          "ElementTypeName": "a",
          "Name": "nextPage",
          "SelectorCount": 1,
          "Selectors": [
            {
              "CustomSelector": "a[Id=\"button_next\"]",
              "Elements": [],
              "Ignore": false,
              "IsCustom": true,
              "IsWindowsInstance": false,
              "Order": 0
            }
          ],
          "Tag": "a"
        }
      ],
      "ScreenShot": null,
      "ElementTypeName": "Web Page",
      "Name": "WebPage",
      "SelectorCount": 1,
      "Selectors": [
        {
          "CustomSelector": ":desktop > domcontainer",
          "Elements": [
            {
              "Attributes": [],
              "CustomValue": "domcontainer",
              "Ignore": false,
              "Name": "Web Page",
              "Tag": "domcontainer"
            }
          ],
          "Ignore": false,
          "IsCustom": false,
          "IsWindowsInstance": false,
          "Order": 0
        }
      ],
      "Tag": "domcontainer"
    }
  ],
  "Version": 1
}
1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.