Identify the observations most in need of identification

matgerke · July 6, 2021, 1:36am

I would like to be able in Explore to sort by the need for identification. This would help me to contribute my efforts where they are most needed.

I think ideally, these would be ranked by (1) the AI’s need for the data, (2) controversial identifications where members disagree on the proper identification, (3) the need to confirm a statistically “unlikely” identification (for instance, an observation that is way outside of the normal geographical range), and finally (4) observations that require another identification to be Research Grade.

If we had this, I think the community could concentrate on the identifications that make the biggest contribution to science. I, for one, would be much more motivated to spend time ID’ing if I knew those IDs mattered a great deal.

kevinfaccenda · July 6, 2021, 3:20am

There are some URL tricks you can use including identification=most_disagree or some_disagree e.g.
https://www.inaturalist.org/observations/identify?per_page=60&taxon_id=47126&place_id=11&identifications=most_disagree

They don’t seem to work super well however. I’d like to see them be made more useful because rightnow they don’t really work.

What I do which works the best however it to go to the filters on the identifications page and do rank low to family, phyllum, order etc. That way you’re only ID’ing things where the AI couldn’t figure it out, or the observation has conflicting ID’s which made the community ID go up to that higher taxonomy.

I would really like to see some sort of feature which can highlight observations where somebody has disagreed with an ID.

The out of range feature would probably be really tricky to implement however, especially for things which have a single digit number of observations or an unknown range.

Also, as for number 4, that is the default. If you use the identifications page you will never see observations which are research grade unless you specifically change the settings to show those.

arboretum_amy · July 6, 2021, 3:37am

From his wording, I think perhaps @matgerke is not using the Identify page, just the Explore page. This is more common than you might expect–I have met people with thousands of identifications who don’t use the Identify page at all.

Or perhaps Mat is saying he wants to see observations that only need one identification (and not more than one) to reach research grade. There’s no way to search for that but it would be quite useful.

arboretum_amy · July 6, 2021, 3:40am

If you have a specific taxon in mind, try the method linked here. However you do have to specify a taxon to work with; I don’t know a way to generalize the search to catch any observations with disagreements. I agree that a new feature allowing me to search for disagreements would be very handy.

jasonhernandez74 · July 6, 2021, 4:52am

I find that the observations most in need of identification are the ones that require a specialist. When I am IDing by place, I set my filters to show me the oldest ones first (the ones which have gone the longest without an ID), and I can usually only do a few per page.

murphyslab · July 6, 2021, 5:37am

As far as I can tell, there isn’t an easy way to find which images would be most useful for the computer vision. Essentially you need to get a species or genus over the threshold of a certain number of observations:

This means that we do include images from observations of captive and cultivated organisms. Lastly, in recent models, a taxon must have at least 100 verifiable observations and at least 50 with a community ID to be included in training (actually, that’s really verifiable + would-be-verifiable-if-not-captive, because we want to train on images of captive/cultivated records too).
https://www.inaturalist.org/blog/31806-a-new-vision-model

A manual approach might be to look for taxa that you’re familiar with and able to ID, then to create a list (or set of lists) that includes those taxa within your domain of expertise that have fewer than 100 observations.

You can use your list of taxa to create a custom search URL that will only show possible members of that list list so you will see any matching taxa that require an ID. It can work quite well: I keep lists of plants in British Columbia that I can readily identify to use in my own custom search URL.

e.g. https://www.inaturalist.org/observations/identify?list_id=1139967

See the wiki for further examples and instructions on custom search URLs.

The disadvantage of this approach is that many users rely heavily on iNat’s initial, computer vision ID suggestions. So if you filter by exact matches for a taxon with zero or few observations, then there is a good chance you won’t find many which match the list. One way around that is to go up one or two levels and spend time filtering through a genus.

earthknight · July 6, 2021, 7:02am

In principle I like that idea, but I very much disagree with the ranking order. Different users, different regions, different aspects of the iNat platform all have very different identification priorities.

Providing sorting options is good, but I’d leave the subjective ranking entirely out of the picture as different users have very different needs.

marina_gorbunova · July 6, 2021, 11:24am

How would you really find this out? Sure there’re many disagreeing ids from new members that choose random taxa, but most disagreeing ids are in fact right ones or closer to the right one than initial (and agreeing) ids. I like (3) and (4) ideas, but 1st and 2nd sound like a lower priority, especially AI needs.

DianaStuder · July 6, 2021, 1:55pm

Periodically I get a batch of notifications - where a scientist has gone thru the Research Grade obs for a taxon to sort out, yes it is, no actually it is … That is also important and useful.

matgerke · July 6, 2021, 2:55pm

Yes, that’s exactly it. There are a lot of unidentifiable photos out there. I’d like to be able to see examples where someone has already tried to identify it.

matgerke · July 6, 2021, 2:59pm

I suspect there’s a simple mathematical formula, but just as a rough approximation, I might say that a count of the number of users on the “losing” side of the identification would be a good, rough measurement of the amount of controversy in an identification. If two people disagree, then the “wrong count” is 1. If there are five people on the winning side and three people on the losing side, then the “wrong count” is 3.

marina_gorbunova · July 6, 2021, 3:03pm

I’d say it still will be too mixed, 1:1 mostly has right id as a second one, though also far from always, so it’s easier to look up observations with disagreement as a whole, they all need attention if that’s the goal.
In the end each id matters for science, and iders usually focus on observers and their effort, there’re easily-ided records from 3 years ago that nobody reviewed yet, so I’d say for science number of your ids matters more than which observations in need you choose.

DianaStuder · July 6, 2021, 7:33pm

this thread came up today. Research Grade, but still wrong
https://forum.inaturalist.org/t/brassica-nigra-and-hirschfeldia-incana-why-so-many-mistakes/24528

matgerke · July 6, 2021, 7:33pm

Yeah, I like the idea of providing sorting options and letting the user decide. But maybe make the default something like what I suggested.

matgerke · July 6, 2021, 7:38pm

My day job is as a data analyst. Whatever machine learning algorithm they are using for image classification will have a straightforward way to identify the identifications that are most important in training the model. Bottom line is that these will be the images that the model is least “certain” about.

marina_gorbunova · July 6, 2021, 7:50pm

Those are species with only a few observations or correctly ided observations, but if you personally can id difficult (or just not reviewed) groups, just do it now, it will be appreciated greatly both by including those species in the next model, but also just having them recognized in a database. Logically having as many good photos for a taxon as possible is what algorythm needs, now I’m spending time catching Nephrotoma it really feels that it could guess many species correctly just from dorsal shot, they have similar-looking, but still different pattern, so at least cv suggestion should be close to actual species, but nobody ids them or does it wrong, so species have 1-10 observations and cv suggest me all the weird things, from Tipula paludosa (which is looking right as every cranefly existing it seems) to Ctenophora, clearly showing that having a lot of genus-level observations is not enough to teach the machine what would be very easy for a human.

murphyslab · July 6, 2021, 8:37pm

However not all images get fed back into the model. Hence those for which the model is presently “least ‘certain’ about” may likely not make it back in, depending on the subject matter. But images of previously unincluded species will almost certainly make it back into the model because of the 1000 images per taxa cap.

tiwane · July 6, 2021, 10:40pm

When it comes to the computer vision model, I’m not sure there is a “best” photo type to train it on, and as others have said, right now if there are over 1000 eligible photos for a taxon we choose 1000 at random so there’s no guarantee that photo will be included in the training set.

It’s also important to remember that the iNat CV model is not being trained to recognize taxa, it’s being trained to recognize iNaturalist users’ photos of those taxa. So we could train on the most beautiful focus-stacked images showing all the diagnostic features, but those photos won’t look like the vast majority of iNat users’ photos, which are often in situ using natural light and a cell phone camera, and thus probably won’t be helpful in training the iNat model.

What I think would help the model the most would be to, as others have said, bring more taxa past the threshold to be included in the next model, especially in places where iNat is observation-deficient. For example, a year or two ago someone in Kenya said that Seek wasn’t very useful there and that was becuase the model hasn’t been trained on many of the common taxa in the region.

lotteryd · July 6, 2021, 10:57pm

Looks like (1) in the original points is well covered by tiwane, plus here are some summary links for the other bullets:

(2) try this filtered version of Life identify: it’s sorted by most recently “active”. Just skip through the blurry or weird things, and look for the records that have a major disagreement, often at Kingdom level, and add your vote either way if you like

(3) many id issues are collected at https://forum.inaturalist.org/t/computer-vision-clean-up-wiki/7281 including some wrong locality “regulars”

(4) default view of https://www.inaturalist.org/observations/identify

earthknight · July 6, 2021, 11:21pm

That’s a widespread issue once you get out of the areas with lots of users. Same here in SE Asia.

Topic		Replies	Views
Making identifying more fun/interesting - wiki General	25	4356	September 24, 2021
The Pleasure of Identifying General	25	902	September 19, 2024
The Watering-Down of Research Grade (maybe?) General	68	1496	January 28, 2025
False "research grade" observations General	37	4170	November 1, 2020
Add a "visually identifiable" scale parameter to set user expectations on confidence of computer vision IDs Feature Requests	48	5280	October 21, 2023

Identify the observations most in need of identification

Related topics