Automatic iNat suggestion for "unknown" observations that reach a certain age

When I was running “identathons” or “IDblitzes” I always wanted a way to put a large group of unknowns into a project, but without the script to do it in batches I dismissed the idea as impractical.

3 Likes

My initial intention would be to let identified observations remain in the projects because:

  • It does not prevent us from reviewing only the observations still unidentified (URL filter: &identified=false ).
  • It would allow us to perform any kind of statistics later (measuring the progress, finding who participates in identifying the unknowns).

I don’t know how to remove a batch of observations from a project. With the API, this can be done only with 1 request per observation to remove. This is another reason not to remove the observations.

I would like to inform specialized people about the projects, when they reach a significant number of observations.

Recently @wthompson23 (seen here) identified many unknown observations in the “Unknown / Lepidoptera” project. Maybe he could tell us if he is using the project or if he identified these observations just by chance. Someone else joined the project, without identifying many observations in it.

3 Likes

Could this tutorial help you adding observations to a project?
https://forum.inaturalist.org/t/bulk-selection-of-observations-to-add-to-a-project/1747/58

The key point is: are you able to select with an iNat URL the observations to add?

I mean I wanted to add other people’s observations to a project, not mine. I don’t need to do it now, but it would have been nice last year. We settled for just adding a few of the most interesting: https://www.inaturalist.org/projects/id-blitz-december-2021

1 Like

I’ve identified a few of the Unknowns in this project, but I came across them just by chance.

3 Likes

Thanks for choosing the Crassulaceae project as an example. Please let the identified observations remain in the project. The statistics are really interesting and reading that ive already found 112 different species in that project is really motivating.
Id love to have more people joining in, so far its more like a one man project.
The approach you took to propose project taxa is really interesting. Im not sure though wether automating the project creation is the right way to go as there probably arent specialists for each of them. I guess many specialists arent even on the forum. Ill try to inform some more IDers about he projects

3 Likes

My concern is that this could go against the observer’s intention. We have that funny project “ignore the elephant seal” for a reason; I worry that if this came to pass, it wouldn’t ignore the elephant seal.

On a more personal note, I have an observation that is still at State of Matter Life because it is an observation of leaf curl, which can have various causes. I don’t want the iNat algorithm to decide to identify it as the host plant.

EDIT: I went ahead and suggested an ID for that leaf curl observation. But the CV’s suggested ID’s were things like Dieffenbachia, which isn’t even close – hence why I oppose this feature.

4 Likes

This is a very interesting initiative. Will you make a journal post with links to all the projects, or at least those that exist already?

2 Likes

I don’t think there’s actually a problem here. Your observation is identified as life - it is not in the unknown pile, so it is not in scope of the request. If something is unknown then the observer has almost certainly failed to express their intention (they may have said something in the notes if they’re new and don’t understand how the site works - but before any automatic ID was applied they will either have left the site of figured out how it works).

In my view, if an auto-ID has been applied, then if any user then adds a disagreeing ID, the auto-ID should be auto-withdrawn.

I think this idea is good, but my preference is to enable a ‘draft mode’ and force an ID on upload, give the observer a choice of iconic taxa, other kingdoms, and of course ‘life’ for those genuine ‘no idea’ jobs. We don’t need unknown and life.

3 Likes

Yes, I used the project to identify those observations. I thought it was really helpful to be able to focus my mind on identifying Unknowns that are likely to be within a particular taxa. It also helped me to avoid seeing a plethora of observations that I am unable to identify beyond a very high level (if at all), which sometimes discourages me when I am looking at all Unknowns. I appreciate and support your efforts to develop and demonstrate this idea.

4 Likes

Yes, I plan to make a journal post referencing all projects, with:

  • links to projects pages,
  • links for identifying observations in one project (for example, observations matching Order Ericales but not matching Genus Impatiens, not matching Family Ericaceae, not matching Family Polemoniaceae, not matching Family Primulaceae, not matching Family Theaceae, to date 466 observations),
  • links for identifying observations in a project and all its subprojects (for example, observations matching Order Ericales, to date 3585 observations).

I also plan to create a project for unidentified observations of trees in South Africa, to populate by cross checking the computer vision suggestions with an exhaustive list of taxa that are trees in South Africa.

Existing projects by now:

Unknown / Aphididae
Project page / Identify observations

Unknown / Lepidoptera
Project page / Identify observations

Unknown / Crassulaceae
Project page / Identify observations

Unknown / Fabeae
Project page / Identify observations
Too late! Already identified!

Unknown / Trifolieae
Project page / Identify observations

4 Likes

Thanks for your support!

Note that if it spares your time (when browsing the observations to identify), then it also spares the server ressources.

3 Likes

there is an existing project. Which observers can use to encourage identifiers.
https://www.inaturalist.org/projects/trees-of-southern-africa

1 Like

For your purposes, would it be helpful to go through the Unknown Lepidoptera in your project, for example, and quickly ID them to “Lepidoptera” (and add annotations for life stage, if they are not adults)? Or would you rather see the observations identified t o something finer? I ask because I am not an expert in Lepidoptera, but I’m pretty sure I can ID a moth or butterfly as “Lepidoptera” when I see one.

Also, it might be helpful to remove observations that are already classified as Casual because they are missing something, like the date of observation.

ETA: For example, filtering out the Casual Lepidopteran Unknowns brings the number of observations down from 161 pages to 28 pages.

There’re many idable to species, most at least to family, but also some observations of plants or unidentifiable shots, clearing them out would help too.

I can certainly clear out the non-Lepidoptera, but I don’t know enough to bring many observations down below Lepidoptera, unfortunately.

1 Like

They’ll still be in the project if anyone wants to refine, and you greatly increase the chance that an identifier searching for Lepidoptera will see them, so I’d say go for it.

1 Like

this is great! I checked crassulaceae and lepidoptera and seems to be working well.
One suggestion - is it possible to include the CV suggestion as an “observation field”?
e.g. on this page https://www.inaturalist.org/observations/138013069
it would show up as Computer Vision suggestion: Lophocampa

After doing a lot of IDs, I have a sort of sense for the kinds of photos where the CV can be trusted to have a good suggestion and if the CV suggestion was already there, I wouldn’t have to ask the CV to generate the suggestion again and I assume that this would relieve some strain on the servers.

It’s a good question/suggestion. The answer belongs to iNat, but as far as I can remember a previous discussion about this :

The choice of iNat is NOT to store in the database the computer vision suggestions. The reason is that the c.v. is updated from time to time (a few times a year), so that generating the suggestions on demand ensures that we always benefit from the best suggestions, resulting from the latest c.v. training.

On the contrary, my software stores the c.v. suggestions in a cache and never asks again for the c.v. suggestions of the same observation. This means that my software makes decisions (to put this observation in that project) on the basis of some c.v. suggestions generated one year ago.

So, for these reasons, I think you should not be afraid of consulting the uptodate c.v. suggestions of an observation, if you think this helps you. It is iNat’s choice to provide you only uptodate suggestions and it is iNat’s choice NOT to store the uptodate suggestions in a cache in the database.

(Should this become an issue, iNat could store the uptodate suggestions in the database, still providing only uptodate suggestions to everyone. iNat would just have to clear the whole cache after every new c.v. training. So, we need NOT do that for iNat, in order to prevent you from requesting again the c.v. suggestions).

(Moreover, writing the CV suggestions in a comment (or wherever) in an observation would also require 1 more request to the API for every observation. No guarantee that this comment will be read by someone, for every observation).


To go into details, I have no access through the API to the top suggestion Genus Lophocampa for observation 138013069 What I receive from the API is a set of 10 suggestions (at the rank species), and each suggestion has a 0-100 confidence score.

My software (see 1, 2) computes a “Best ID” from these 10 suggestions and scores (using my own algorithm) after analyzing where these 10 taxa are located in the whole taxonomical tree. No guarantee that the result will match the website top suggestion (and it does NOT need to, because I made my own algorithm tuning, according to how cautious I wanted this “Best ID” to be). An observation will be put in the Lepidoptera project only if this “Best ID” is Lepidoptera or any taxon below Lepidoptera.

For observation 138013069, the 10 suggestions and their 10 associated scores are as follows (the scores are between brackets):

There are also 10 others scores for telling if the 10 taxa have been “seen nearby”.

As you see, the top suggestion Genus Lophocampa displayed by the website is none of the 10 suggestions provided by the c.v. through the API. This top suggestion has been computed somewhere but I have no access to it. It is simply not provided by the API (and I don’t know the algorithm generating it).

The website displays at most 8 of the 10 suggestions:

4 Likes