Automatic iNat suggestion for "unknown" observations that reach a certain age

the way i envision it, you would have 1 filter per field. they would be separate. if you wanted to see observations that had no human id and had bee identified as animal by the cv, you would filter for unknown observation taxon and animal cv taxon. no need to wait a year or some other arbitrary time period.

1 Like

Not sure that would address the intent of the feature request, since it would still require an initial choice to filter for Unknowns, as is currently the case. For those who do make the choice, they would then have the option to also filter on CV taxon, which would be a good thing. But it doesn’t help to change the need to make that initial choice, or the number of identifiers making it.

2 Likes

i’m not sure why that matters, as long as you can still arrive at the same desired effect.

not sure what you’re getting at here. yes, ultimately humans will still need to make ids. i think if you have a separate cv taxon, you might actually get lower level human ids faster because you would reduce the need for experts to either wade through unknowns blindly or rely on others to provide a high level id.

Maybe I’m misinterpreting the intent of the original ask:

…but I’m pretty sure it didn’t envision still needing to filter for these observations as Unknowns. Adding CV suggestion as a new filter option in any category would definitely be great, but it doesn’t improve the initial surfacing of old unknown observations to more users over what we have now.

My motivation is to increase the number of identified angiosperms in some target countries - I’d just as happily be going through at a lower level, but this is a quick thing I can do to put more into the pool for the botanists who are involved to look at.

3 Likes

:thinking:

i’ve re-read your comments and the original post several times, and i still don’t understand why a separate cv taxon field wouldn’t fully address the original problem.

it seems to me like the core intent of the original request is to

if i’m interpreting this correctly, a new cv taxon field would provide that functionality by allowing someone to search for observation taxon = unknown and cv taxon = [whatever taxon the identifier is looking for].

maybe the concern is that people might not realize there’s a new field available? that’s a reasonable concern, but it seems like that could be resolved with minimal education to let people know that new functionality was available and how to use it.

or maybe when you talk about “surfacing old unknown observations”, you’re saying that adding cv taxon only at the creation of an observation would mean that observations created before such a change was implemented would still not get a cv taxon? this is true, but to address that, you could go through one time only and add cv taxon for all those old observations (at the same time the change was deployed). once that was done, you would be able to search for observation taxon = unknown and cv taxon = [desired taxon] for any observation in the system, including old observations.

if the concern is that you want to filter by just one field rather than two, then i would say the one field approach is just going to end up messier and create undesired downstream effects. so going with a separate cv taxon will be better in the long run, even if you have to filter by 2 fields to find unknowns that might be in your desired taxon for identification.

i’m not sure if that addresses your concerns, but hopefully it does…

2 Likes

Additional advantages of a separate cv taxon field while leaving the observation in Unknown that I can think of are: it leaves the placeholder untouched until a human looks at the observation, and it would probably leave the observation closer in the Identify stream to the observer’s other observations of that date (making it easier for an eventual human to figure out what the observer might have intended as the subject).

1 Like

I totally understand and support the desirability of functionality like this.

By “surfacing old unknown observations,” what I mean is, when someone filters their identification stream for all needs-ID vascular plants from Nevada (as an example), I want them to see two kinds of observations in that stream:

  • all needs-ID Nevada observations identified by a human as Tracheophyta or a descendant
  • all old Unknown Nevada observations identified by CV as Tracheophyta or descendant

Whether the CV ID is stored separately or not is immaterial for purposes of that outcome, though I agree with the other advantages of separate storage.

Yes, I know that I can also select the Unknown iconic taxon button and get close to the same thing.

The point of a topic named “Automatic iNat suggestion for “unknown” observations that reach a certain age” - by my reading - is that most people don’t do this, resulting in long-languishing Unknown observations, and we need a better way of automatically surfacing such observations for identification.

This is becoming a dead horse at this point, and I don’t know how to be any clearer. So I will stop and let the original poster speak to their true intentions, which I may well have wrong.

2 Likes

ok. thanks. this seems to be at heart of our differing perspectives. i think i finally see what you’re saying. my assumption is that even if what you’re saying here is true now (presumably because they don’t want to go through everything in the unknown pile), there would be a behavior shift as people realized they had a new way to go through the unknown pile in a way that gets them mostly only the things they’re interested in.

2 Likes

I’ve looked through a couple of the “Unknown” pools, including Pennsylvania (presumably curated by the original poster) and India (presumably curated by yourself). It looks like the Pennsylvania pool stretches back about 6 months before hitting things that are realistically not identifiable, even broadly; the India pool is maybe about 30% larger and goes back 9 months or so. Maybe the sizes have recently increased sharply, but it’s not clear that large numbers of observations are languishing for years. I’m also concerned that this is not necessarily the rate-limiting step for useful IDs; for instance, the pool of “plants” unidentified below class level in Pennsylvania is about twice the size of the “unknown” pool. (In other words, speeding up removing things from the unknown pool won’t increase our overall rate of identification, although it may change which observations get ID’d.)

I’m concerned that automating high-level CV identifications is solving the wrong problem: instead of having a large pool of unknowns which are easily identified as such and which can be given high-level IDs relatively quickly by both expert identifiers and new people, we’re going to wind up with a large pool of observations with high-level IDs which are still sitting idle…and some of those high-level IDs will be wrong, and they’ll now be much harder to find. I think I’d rather encourage recruiting more people to to high-level ID: even non-experts can usually distinguish between a vascular plant, an insect, and an amphibian, and it can be done without thinking very hard (unlike some of the more difficult lower-level IDs).

8 Likes

Hi - I’m with you on recruiting more identifiers. & I do like the idea above of distinguishing a high level CV id from a user id, so that it doesn’t carry the weight of a human opinion but does come up in filters for plants.

For India, lots of observations HAD been languishing for years in Unknown - the reason that Unknowns in India only stretch back 9 months is that I and others have spent the hours classifying them (starting at the oldest) because we’re motivated by a particular project effort. These included some highly identifiable observations from a particular user that had just been ignored due to age. We are limited by the amount of available identifier effort - and there are many other ways to improve the data resource (in this case on wild flowering plants), like marking up pot-plants and street trees and finding / drawing attention to good observations with missing metadata (usually an upload problem with the app)…

6 Likes

My perception is that in most cases identifiers just don’t realize or remember that the Unknowns are there to be gone through, and that there are easy ways to include them.

A concerted education campaign would be one way to address that, but if the technology can help too, then why not? :wink:

1 Like

I’m a non-expert at everything, and I spend a lot of time going through unknowns. I learn a lot from this! But I wonder if I wouldn’t learn more having cv automatically applied, and then going through local guides / examples and trying to one-up cv in more restricted domains.

I don’t know a good way to implement, but I can imagine I’d be more useful going through a series of unknowns where “computer vision result was marked very wrong”. We all know this happens, and that sometimes humans can quickly figure out where things went wrong and get the observation back on the rails.

3 Likes

"By “surfacing old unknown observations,” what I mean is, when someone filters their identification stream for all needs-ID vascular plants from Nevada (as an example), I want them to see two kinds of observations in that stream:

all needs-ID Nevada observations identified by a human as Tracheophyta or a descendant
all old Unknown Nevada observations identified by CV as Tracheophyta or descendant"

This seems like the right solution. The problem is not that observations do not have an ID, the problem is that experts in a particular taxon cannot find the observations which no one has yet identified to that level. For example, I have reviewed all the aphid observations, but I will never see the observations of aphids which are currently stuck at an identification of insects, arthropods, or unknown.

The computer vision should be good enough to add at least a few hundred obvious aphids to the search. And even things which are not aphids, but look similar, are most likely to be correctly identified if added to that search.

2 Likes

Wow, thank you for tackling those Indian plant observations stuck in Unknown! I found them too exhausting and frustrating, and had to step away a while ago. There were indeed many fantastic observations there just waiting for some attention.

3 Likes

I have been following this discussion with interest. When I created this feature request, I was mostly spending my time IDing “unknowns.” I was concerned that the number of “unknowns” were increasing but I was really interested in increasing the efficiency of putting the correct label on the observation.

I think both of those premises are somewhat faulty:

  1. It turns out that the number of unknown observations is not increasing. Bouteloua posted a nice graph that shows that the number of unknowns spike in April in the Northern Hemisphere but then it gets worked off during the winter months. It has been roughly stable for several years.

  2. Choess pointed out that the “unknown” observations are not the “rate-limiting step,” in IDing observations, which I think is a good way of putting it. There are way more observations stuck at higher-level taxonomic classifications (Kingdom, phylum, subphylum, etc.) in plants, for example, than unknowns.

So from an efficiency standpoint, the proposed feature request might not really help that much, especially if it is just used to get an unknown to a kingdom or phylum-level classification.

From the above, I’m less convinced now that this feature should be implemented.

If we could somehow just get the observations that are suitable for CV to be the ones that CV is applied to, and these observations could be IDed to genus, family or order with an acceptable error rate, I would be all for that. But there are a bunch of unknowns that have blurry images, or the focus of the observation is unclear, or have any number of other issues that can confuse the CV; if these kinds of observations can’t be screened out somehow, I imagine that the error rate would be unacceptable.

If I were proposing a new feature request, I think I’d suggest that it be a sort of experiment to be tested on a limited geographic region/or maybe even taxonomic group (or maybe on a “low risk” group like observations marked “captive/cultivated”) to see how well it works. Obviously it will only work well in areas that have lots of observations already, and I can imagine it working better on some taxa than others. I wouldn’t limit it to unknowns, either, maybe go down to phylum and see if it could help address some of the backlog at higher classifications.

That’s my 2 cents, thanks for all the thoughtful comments.

7 Likes

i suggested above that the cv taxon should be captured in a separate field. i’m thinking if it could be expanded to capture confidence of the cv suggestion (see https://forum.inaturalist.org/t/computer-vision-should-tell-us-how-sure-it-is-of-its-suggestions/1230), then i think that gets you what you want, and it could also get what i was describing here:

We’ve discussed this idea as a team, and while it’s certainly interesting, it’s unlikely to be something we implement in at least a year, if we do implement it.

7 Likes

I always appreciate it when you let us know. Thank you.

6 Likes

There is a discussion thread about this:
https://forum.inaturalist.org/t/easy-way-to-mark-multiple-species-observations/278

Although there is no definite solution defined, I think a new DQA entry would not hurt and woudl be useful, whatever is done later with these observations:
https://forum.inaturalist.org/t/easy-way-to-mark-multiple-species-observations/278/49