Automatic iNat suggestion for "unknown" observations that reach a certain age

jwidness · September 13, 2019, 9:12pm

The Ident-o-tron is not the same thing as computer vision. You used to be able to get to the Ident-o-tron from clicking Compare on an observation page (if I’m remembering correctly). Now, if you right-click Compare and open in new window, you’ll get to it. For example: https://www.inaturalist.org/observations/identotron?observation_id=32662793&taxon=85035#establishment_means=&order=&place=691&taxon=85035

From the help text: “The Identotron is a tool to help you identify observations based on iNat’s check list data. Given a higher level taxon and a place (e.g. mammals of India), iNat will look for check lists entries matching that taxon in that place, kind of like a dynamic field guide to the entire planet.”

There’s no computer vision in it at all – it’s basically the equivalent of using the Suggestions tab with Source set to Checklist.

Edit: or maybe you weren’t suggesting Identotron is equivalent to the Computer Vision and I mis-read your statement?

NancyinSunnyvale · September 13, 2019, 9:17pm

Maybe wait at least a day for the observer or the community to take care of the problem and then have the system convey a msg to the observer explaining why at least a coarse ID is needed.

Sometimes, when I upload a lot of observations–especially ones from a cellphone–I accidentally leave out an ID.

Currently, I have no way to locate these observations. I have tried to do a search, but my options seemed to be searching for a taxon or searching for all taxa (“Life”).

BTW, should I assume any automatic IDs would be at the family level or higher?

jdmore · September 13, 2019, 9:26pm

Thanks, I added an appropriate glossary entry. I had never been super clear about the Ident-o-tron, not having used it much, but it is certainly a helpful tool! Left-clicking on Compare does bring up a pop-up window version of the same thing, by the way.

jdmore · September 13, 2019, 9:29pm

So, acknowledging my own contribution to the problem, let’s try to get back on topic here about the possibility of automatically applying Computer Vision to old Unknown observations. Feel free to open a separate topic to discuss how Compare or CV works. Thanks all.

schoenitz · September 13, 2019, 10:57pm

And presumably you got something out of that activity, or else you would not have identified these old observations. If there were no unidentified observations, you would have to find something else to do that motivates you. And by ‘you’ I mean you, me, and everybody who does coarse identification for their own individual reasons.

I simply disagree that the existence of old, unidentified, stranded observations is inherently a problem that needs to be solved, much less with the help of automation.

pisum · September 14, 2019, 1:10am

Another reason for keeping it separate is that it’s a lot cleaner. for example, if you put computer vision ids in the main id stream, i can see people getting annoyed by the inevitable notifications this would generate, and someone would probably ask for the ids by the computer vision user to not generate notifications. or i can see the computer vision becoming a top identifier. so somone would probably ask at some point to have the computer vision user to be removed from the top identifier list. or
someone would ask for a way to see ids excluding those from a computer vision user. all of these would require additional system changes. so it would be better just to keep things separate in the first place to avoid causing and then having to adress a lot of unwanted downstream effects.

jdmore · September 14, 2019, 4:59am

As long as it still has the OP’s intended effect of getting the observation out of the Unknown pile and into (at least) a broadly identified category where it’s more likely to be encountered by human identifiers, that seems fine to me. Once a regular ID has been added, that would become the controlling ID for filtering purposes. If all human IDs got withdrawn for some reason, then the current CV ID would again become the controlling taxon for filters.

pisum · September 14, 2019, 5:07am

the way i envision it, you would have 1 filter per field. they would be separate. if you wanted to see observations that had no human id and had bee identified as animal by the cv, you would filter for unknown observation taxon and animal cv taxon. no need to wait a year or some other arbitrary time period.

jdmore · September 14, 2019, 5:19am

Not sure that would address the intent of the feature request, since it would still require an initial choice to filter for Unknowns, as is currently the case. For those who do make the choice, they would then have the option to also filter on CV taxon, which would be a good thing. But it doesn’t help to change the need to make that initial choice, or the number of identifiers making it.

pisum · September 14, 2019, 5:34am

i’m not sure why that matters, as long as you can still arrive at the same desired effect.

not sure what you’re getting at here. yes, ultimately humans will still need to make ids. i think if you have a separate cv taxon, you might actually get lower level human ids faster because you would reduce the need for experts to either wade through unknowns blindly or rely on others to provide a high level id.

jdmore · September 14, 2019, 6:01am

Maybe I’m misinterpreting the intent of the original ask:

…but I’m pretty sure it didn’t envision still needing to filter for these observations as Unknowns. Adding CV suggestion as a new filter option in any category would definitely be great, but it doesn’t improve the initial surfacing of old unknown observations to more users over what we have now.

lera · September 14, 2019, 6:56pm

My motivation is to increase the number of identified angiosperms in some target countries - I’d just as happily be going through at a lower level, but this is a quick thing I can do to put more into the pool for the botanists who are involved to look at.

pisum · September 14, 2019, 8:45pm

i’ve re-read your comments and the original post several times, and i still don’t understand why a separate cv taxon field wouldn’t fully address the original problem.

it seems to me like the core intent of the original request is to

if i’m interpreting this correctly, a new cv taxon field would provide that functionality by allowing someone to search for observation taxon = unknown and cv taxon = [whatever taxon the identifier is looking for].

maybe the concern is that people might not realize there’s a new field available? that’s a reasonable concern, but it seems like that could be resolved with minimal education to let people know that new functionality was available and how to use it.

or maybe when you talk about “surfacing old unknown observations”, you’re saying that adding cv taxon only at the creation of an observation would mean that observations created before such a change was implemented would still not get a cv taxon? this is true, but to address that, you could go through one time only and add cv taxon for all those old observations (at the same time the change was deployed). once that was done, you would be able to search for observation taxon = unknown and cv taxon = [desired taxon] for any observation in the system, including old observations.

if the concern is that you want to filter by just one field rather than two, then i would say the one field approach is just going to end up messier and create undesired downstream effects. so going with a separate cv taxon will be better in the long run, even if you have to filter by 2 fields to find unknowns that might be in your desired taxon for identification.

i’m not sure if that addresses your concerns, but hopefully it does…

sgene · September 14, 2019, 11:28pm

Additional advantages of a separate cv taxon field while leaving the observation in Unknown that I can think of are: it leaves the placeholder untouched until a human looks at the observation, and it would probably leave the observation closer in the Identify stream to the observer’s other observations of that date (making it easier for an eventual human to figure out what the observer might have intended as the subject).

jdmore · September 15, 2019, 4:05am

I totally understand and support the desirability of functionality like this.

By “surfacing old unknown observations,” what I mean is, when someone filters their identification stream for all needs-ID vascular plants from Nevada (as an example), I want them to see two kinds of observations in that stream:

all needs-ID Nevada observations identified by a human as Tracheophyta or a descendant
all old Unknown Nevada observations identified by CV as Tracheophyta or descendant

Whether the CV ID is stored separately or not is immaterial for purposes of that outcome, though I agree with the other advantages of separate storage.

Yes, I know that I can also select the Unknown iconic taxon button and get close to the same thing.

The point of a topic named “Automatic iNat suggestion for “unknown” observations that reach a certain age” - by my reading - is that most people don’t do this, resulting in long-languishing Unknown observations, and we need a better way of automatically surfacing such observations for identification.

This is becoming a dead horse at this point, and I don’t know how to be any clearer. So I will stop and let the original poster speak to their true intentions, which I may well have wrong.

pisum · September 15, 2019, 12:53pm

ok. thanks. this seems to be at heart of our differing perspectives. i think i finally see what you’re saying. my assumption is that even if what you’re saying here is true now (presumably because they don’t want to go through everything in the unknown pile), there would be a behavior shift as people realized they had a new way to go through the unknown pile in a way that gets them mostly only the things they’re interested in.

choess · September 15, 2019, 3:06pm

I’ve looked through a couple of the “Unknown” pools, including Pennsylvania (presumably curated by the original poster) and India (presumably curated by yourself). It looks like the Pennsylvania pool stretches back about 6 months before hitting things that are realistically not identifiable, even broadly; the India pool is maybe about 30% larger and goes back 9 months or so. Maybe the sizes have recently increased sharply, but it’s not clear that large numbers of observations are languishing for years. I’m also concerned that this is not necessarily the rate-limiting step for useful IDs; for instance, the pool of “plants” unidentified below class level in Pennsylvania is about twice the size of the “unknown” pool. (In other words, speeding up removing things from the unknown pool won’t increase our overall rate of identification, although it may change which observations get ID’d.)

I’m concerned that automating high-level CV identifications is solving the wrong problem: instead of having a large pool of unknowns which are easily identified as such and which can be given high-level IDs relatively quickly by both expert identifiers and new people, we’re going to wind up with a large pool of observations with high-level IDs which are still sitting idle…and some of those high-level IDs will be wrong, and they’ll now be much harder to find. I think I’d rather encourage recruiting more people to to high-level ID: even non-experts can usually distinguish between a vascular plant, an insect, and an amphibian, and it can be done without thinking very hard (unlike some of the more difficult lower-level IDs).

lera · September 16, 2019, 10:29am

Hi - I’m with you on recruiting more identifiers. & I do like the idea above of distinguishing a high level CV id from a user id, so that it doesn’t carry the weight of a human opinion but does come up in filters for plants.

For India, lots of observations HAD been languishing for years in Unknown - the reason that Unknowns in India only stretch back 9 months is that I and others have spent the hours classifying them (starting at the oldest) because we’re motivated by a particular project effort. These included some highly identifiable observations from a particular user that had just been ignored due to age. We are limited by the amount of available identifier effort - and there are many other ways to improve the data resource (in this case on wild flowering plants), like marking up pot-plants and street trees and finding / drawing attention to good observations with missing metadata (usually an upload problem with the app)…

jdmore · September 16, 2019, 6:44pm

My perception is that in most cases identifiers just don’t realize or remember that the Unknowns are there to be gone through, and that there are easy ways to include them.

A concerted education campaign would be one way to address that, but if the technology can help too, then why not?

schizoform · September 17, 2019, 3:24am

I’m a non-expert at everything, and I spend a lot of time going through unknowns. I learn a lot from this! But I wonder if I wouldn’t learn more having cv automatically applied, and then going through local guides / examples and trying to one-up cv in more restricted domains.

I don’t know a good way to implement, but I can imagine I’d be more useful going through a series of unknowns where “computer vision result was marked very wrong”. We all know this happens, and that sometimes humans can quickly figure out where things went wrong and get the observation back on the rails.

Topic		Replies	Views
Automatic computer vision IDs? General	7	1335	September 24, 2021
Automatically Suggest ID General	4	274	September 23, 2021
IDs getting auto-filled (instead of just suggesting) Bug Reports	4	603	January 28, 2020
Is there an easy way to see if a submitter just went along with the iNat ID suggestion? General question	7	372	December 7, 2021
Offer similar observations to help confirm older identifications Feature Requests	7	735	February 4, 2021

Automatic iNat suggestion for "unknown" observations that reach a certain age

Related Topics