Automatic iNat suggestion for "unknown" observations that reach a certain age

glmory · September 21, 2019, 3:23am

"By “surfacing old unknown observations,” what I mean is, when someone filters their identification stream for all needs-ID vascular plants from Nevada (as an example), I want them to see two kinds of observations in that stream:

all needs-ID Nevada observations identified by a human as Tracheophyta or a descendant
all old Unknown Nevada observations identified by CV as Tracheophyta or descendant"

This seems like the right solution. The problem is not that observations do not have an ID, the problem is that experts in a particular taxon cannot find the observations which no one has yet identified to that level. For example, I have reviewed all the aphid observations, but I will never see the observations of aphids which are currently stuck at an identification of insects, arthropods, or unknown.

The computer vision should be good enough to add at least a few hundred obvious aphids to the search. And even things which are not aphids, but look similar, are most likely to be correctly identified if added to that search.

julie_sf · September 21, 2019, 4:14am

Wow, thank you for tackling those Indian plant observations stuck in Unknown! I found them too exhausting and frustrating, and had to step away a while ago. There were indeed many fantastic observations there just waiting for some attention.

matthias55 · September 22, 2019, 7:32pm

I have been following this discussion with interest. When I created this feature request, I was mostly spending my time IDing “unknowns.” I was concerned that the number of “unknowns” were increasing but I was really interested in increasing the efficiency of putting the correct label on the observation.

I think both of those premises are somewhat faulty:

It turns out that the number of unknown observations is not increasing. Bouteloua posted a nice graph that shows that the number of unknowns spike in April in the Northern Hemisphere but then it gets worked off during the winter months. It has been roughly stable for several years.
Choess pointed out that the “unknown” observations are not the “rate-limiting step,” in IDing observations, which I think is a good way of putting it. There are way more observations stuck at higher-level taxonomic classifications (Kingdom, phylum, subphylum, etc.) in plants, for example, than unknowns.

So from an efficiency standpoint, the proposed feature request might not really help that much, especially if it is just used to get an unknown to a kingdom or phylum-level classification.

From the above, I’m less convinced now that this feature should be implemented.

If we could somehow just get the observations that are suitable for CV to be the ones that CV is applied to, and these observations could be IDed to genus, family or order with an acceptable error rate, I would be all for that. But there are a bunch of unknowns that have blurry images, or the focus of the observation is unclear, or have any number of other issues that can confuse the CV; if these kinds of observations can’t be screened out somehow, I imagine that the error rate would be unacceptable.

If I were proposing a new feature request, I think I’d suggest that it be a sort of experiment to be tested on a limited geographic region/or maybe even taxonomic group (or maybe on a “low risk” group like observations marked “captive/cultivated”) to see how well it works. Obviously it will only work well in areas that have lots of observations already, and I can imagine it working better on some taxa than others. I wouldn’t limit it to unknowns, either, maybe go down to phylum and see if it could help address some of the backlog at higher classifications.

That’s my 2 cents, thanks for all the thoughtful comments.

pisum · September 22, 2019, 8:50pm

i suggested above that the cv taxon should be captured in a separate field. i’m thinking if it could be expanded to capture confidence of the cv suggestion (see https://forum.inaturalist.org/t/computer-vision-should-tell-us-how-sure-it-is-of-its-suggestions/1230), then i think that gets you what you want, and it could also get what i was describing here:

Computer Vision should tell us how sure it is of its suggestions

in that future world, i can see a new filter option in Explore and Identify where an expert could pull back, say, only the observations that iNat vision has less confidence about, with the assumption that observations with high vision confidence can easily be IDed by less experienced community members, with or without the vision assistance. let the novices ID the blue jays. (it’s sort of like the concept of comparative advantage in economics. even if an expert can ID both the easy and hard stuff way faster than a novice, the novices will never be able to ID the hard stuff. so let them do the easy stuff so that the experts can move the needle on the hard stuff. and if experts want to take a break and do the easy stuff once in a while, that’s okay, too.)

tiwane · September 25, 2019, 3:44am

We’ve discussed this idea as a team, and while it’s certainly interesting, it’s unlikely to be something we implement in at least a year, if we do implement it.

sgene · September 25, 2019, 4:28am

I always appreciate it when you let us know. Thank you.

jeanphilippeb · January 29, 2021, 8:24pm

There is a discussion thread about this:
https://forum.inaturalist.org/t/easy-way-to-mark-multiple-species-observations/278

Although there is no definite solution defined, I think a new DQA entry would not hurt and woudl be useful, whatever is done later with these observations:
https://forum.inaturalist.org/t/easy-way-to-mark-multiple-species-observations/278/49

sbushes · November 10, 2021, 12:24am

There used to be iconic taxa buttons?!
I’ve wondered about proposing this before.
Why & when were they cut out? I’d imagine these to be helpful in encouraging users not to use species-level autosuggests blindly so much.

tiwane · November 10, 2021, 7:07pm

They were removed when computer vision was incorporated into iNat. If we were to add them in again, I think we’d have to figure out a flow that was wasn’t confusing. Currently many people don’t know they can search for taxa and aren’t just limited to the CV suggestions. Anyway, this is getting a bit far afield from the requested feature and maybe something to think about for the app redesign.

halvandenhjerne · December 2, 2021, 12:55pm

Simple experimental solution:

Create a bot user that IDs old, neglected observations.
Instruct it to never suggest species or subspecies, only genus or subgenus. That way, it won’t move anything towards Research Grade.
You could even have it automatically retract its suggestions when enough human users provide theirs.

And if said bot turns out to be more annoying than helpful, just ban it for trolling!

charlie · December 2, 2021, 4:28pm

i like this idea, i feel like others might not, but i like it.

richyfourtytwo · December 2, 2021, 5:03pm

My perspective on this is similar to that of @matthias55 (edit: in his 2nd post) above. When looking at observations older than say 18 months, I doubt there are that many observations in the unknown pile that can be IDed to genus easily (be it via CV or manually).

Personally I’m more concerned about blurry images, or nice images showing a landscape with probably 200 species visible, but no focus on any of them and none IDable in any reasonable sense. What I find frustrating about this is that if I leave it unIDed generations of IDers to come will look at this image again and waste their time too. I know, this has been discussed on another thread (or multiple).

charlie · December 2, 2021, 5:13pm

i wonder how hard it is to track how many people click ‘reviewed’ on each observation. And maybe after a certain number, an observation could get ‘demoted’ maybe not quite to casual but to a lower priority, or something. Though i will sometimes click reviewed on something i’d have no chance at identifying so maybe it’s a bad idea. Maybe separate them out as ‘extra challenges’ and create a way to mark them more easily as needing casual grade. Something like that

dianastuder · December 2, 2021, 8:28pm

How many Reviewed clicks - is definitely info available to iNat.
Set a barrier - 6 clicks is Unidentifiable.
Then the suckers for punishment can still try and work out What Are We Looking At. If they want to …

On second thoughts. If that was visible to us This obs has already been reviewed six times and is still UNKNOWN It would be a kind warning to scroll on by.

richyfourtytwo · December 3, 2021, 8:44am

Visibility would be good. Being able to filter on this would be perfect.

jeanphilippeb · December 4, 2022, 8:38am

Yes, you can!

Observations of Aphididae currently without identification at all:
https://www.inaturalist.org/observations/identify?quality_grade=casual%2Cneeds_id&verifiable=any&identified=false&project_id=152104

Similarly, observations of Lepidoptera currently without identification at all:
https://www.inaturalist.org/observations/identify?quality_grade=casual%2Cneeds_id&verifiable=any&identified=false&project_id=152106

tonyrebelo · December 4, 2022, 11:32am

For plants, most useful is trees:
e.g.: https://www.inaturalist.org/posts/36472-using-trees-of-southern-africa-to-get-trees-identified

I detest identifications to Plants, Vascular Plants, Flowering Plants, most especially for easy to ID groups like daisies, peas and trees.
Especially when even a novice can easily identify a tree, and trees are such well known subset of the flora of any region, often with multiple field guides and abundant enthusiasts. If novices interested in helping with identification could just be informed about these projects for lower groups it would really help.

But I dont understand why butterflies and moths are not just identified to Lepidoptera, and Aphids as Aphididae? Why do they need a special project?
It is polyphyletic groups such as Trees, Succulents, Bulbs, Seaweeds, Aquatic Plants, etc. that need projects to assist with IDs.
Now if there could be icons for these polyphyletic groups for novices to easily post unidentified observations, that would be supercool!

fffffffff · December 4, 2022, 11:48am

Stop, what’s the point of adding aphids to a project instead of iding them? And first obs I saw was a collembola, then 2 plants who somebody marked as planted, when they’re not. But if photo is of an aphid, id it as such.

jeanphilippeb · December 4, 2022, 1:02pm

The point is to help knowledgeable people find observations they are interested in, by “surfacing unknown observations” as mentioned above. I started with aphids because I had in mind @glmory comment above. Then I did the same for butterflies, just because this is more popular.

I grabbed about 300.000 observations (of all kinds) without identification. I don’t intend to ID them myself. (Yet I did it myself for the taxa I am the most interested in). And I don’t intend to ID them automatically either, for avoiding mistakes and because it is not iNat policy (else iNat would already ID all observations automatically).

By now, I just illustrate what it is possible to do for unidentified observations, using computer vision as a filter when searching for observations. Previous discussion here.

The new point is that I make a “surfacing unknown observations” feature available to you without providing a separate software tool as I did before, simply by using an existing feature of iNat, namely the traditional projects. (Very few people have used the separate tool I made available earlier).

Future perspective : I could also automate the selection of taxa (like aphids or butterflies) for which other similar projects could be created, and then populate all these projects automatically (potentially with hundreds of thousands of observations, distributed over different projects). The choice of these taxa should take into account how many observations would fall in the different projects, so that there is no project with too few, or too many, observations. But for the moment it is urgent to do nothing…

jeanphilippeb · December 4, 2022, 2:25pm

If getting them just identified as such is what is wished, then one should just ask iNat to automatically identify all observations (the old ones and every new one) based on computer vision. The ID would be a taxa of low enough rank, for avoiding mistakes.

The choice of the rank would likely depend on the taxa itself. The rank would not be always a Family, for all observations. It should be choosen so that neither too many, nor too few observations, fall in the taxon. A family or order rank may be fine for families or orders with few observations, but not for others.

For Fabaceae, for instance, I would imagine :

a separate project for Tribe Fabeae (many observations and tendrils are easy to see),
a separate project for all other Subfamily Faboideae (many observations and similar flower shapes),
a separate project for Subfamily Caesalpinioideae,
a separate project for all other Fabaceae.

I think it should be feasible to generate automatically a reasonable choice of taxa/projects, covering the whole taxonomy, with an algorithm working on a large enough set of “unknown” observations.

Topic		Replies	Views
Automatic computer vision IDs? General	7	1344	September 24, 2021
Automatically Suggest ID General	4	277	September 23, 2021
IDs getting auto-filled (instead of just suggesting) Bug Reports	4	607	January 28, 2020
Is there an easy way to see if a submitter just went along with the iNat ID suggestion? General question	7	374	December 7, 2021
Offer similar observations to help confirm older identifications Feature Requests	7	739	February 4, 2021

Automatic iNat suggestion for "unknown" observations that reach a certain age

Related Topics