Automatic iNat suggestion for "unknown" observations that reach a certain age

I can certainly clear out the non-Lepidoptera, but I don’t know enough to bring many observations down below Lepidoptera, unfortunately.

1 Like

They’ll still be in the project if anyone wants to refine, and you greatly increase the chance that an identifier searching for Lepidoptera will see them, so I’d say go for it.

1 Like

this is great! I checked crassulaceae and lepidoptera and seems to be working well.
One suggestion - is it possible to include the CV suggestion as an “observation field”?
e.g. on this page https://www.inaturalist.org/observations/138013069
it would show up as Computer Vision suggestion: Lophocampa

After doing a lot of IDs, I have a sort of sense for the kinds of photos where the CV can be trusted to have a good suggestion and if the CV suggestion was already there, I wouldn’t have to ask the CV to generate the suggestion again and I assume that this would relieve some strain on the servers.

It’s a good question/suggestion. The answer belongs to iNat, but as far as I can remember a previous discussion about this :

The choice of iNat is NOT to store in the database the computer vision suggestions. The reason is that the c.v. is updated from time to time (a few times a year), so that generating the suggestions on demand ensures that we always benefit from the best suggestions, resulting from the latest c.v. training.

On the contrary, my software stores the c.v. suggestions in a cache and never asks again for the c.v. suggestions of the same observation. This means that my software makes decisions (to put this observation in that project) on the basis of some c.v. suggestions generated one year ago.

So, for these reasons, I think you should not be afraid of consulting the uptodate c.v. suggestions of an observation, if you think this helps you. It is iNat’s choice to provide you only uptodate suggestions and it is iNat’s choice NOT to store the uptodate suggestions in a cache in the database.

(Should this become an issue, iNat could store the uptodate suggestions in the database, still providing only uptodate suggestions to everyone. iNat would just have to clear the whole cache after every new c.v. training. So, we need NOT do that for iNat, in order to prevent you from requesting again the c.v. suggestions).

(Moreover, writing the CV suggestions in a comment (or wherever) in an observation would also require 1 more request to the API for every observation. No guarantee that this comment will be read by someone, for every observation).


To go into details, I have no access through the API to the top suggestion Genus Lophocampa for observation 138013069 What I receive from the API is a set of 10 suggestions (at the rank species), and each suggestion has a 0-100 confidence score.

My software (see 1, 2) computes a “Best ID” from these 10 suggestions and scores (using my own algorithm) after analyzing where these 10 taxa are located in the whole taxonomical tree. No guarantee that the result will match the website top suggestion (and it does NOT need to, because I made my own algorithm tuning, according to how cautious I wanted this “Best ID” to be). An observation will be put in the Lepidoptera project only if this “Best ID” is Lepidoptera or any taxon below Lepidoptera.

For observation 138013069, the 10 suggestions and their 10 associated scores are as follows (the scores are between brackets):

There are also 10 others scores for telling if the 10 taxa have been “seen nearby”.

As you see, the top suggestion Genus Lophocampa displayed by the website is none of the 10 suggestions provided by the c.v. through the API. This top suggestion has been computed somewhere but I have no access to it. It is simply not provided by the API (and I don’t know the algorithm generating it).

The website displays at most 8 of the 10 suggestions:

4 Likes

That is what I do for my Unknowns, and the finer IDs roll in promptly from people who have filters set to Lepidoptera.

2 Likes

Beware that this bug might have affected the selection of observations that I pushed recently to the “unknown” observations projects mentioned above:
https://forum.inaturalist.org/t/random-id-suggestions/37978
https://forum.inaturalist.org/t/more-recently-the-ai-with-inaturalist-seems-to-be-failing/38010

Beside losing data, providing wrong data is the worst thing a server can do.

I will not check if I really got wrong data in cache, I just discard everything I got from the 12th december, by restoring a backup I made, by chance (in principle, I trust the server), on 11th december 2022. I will not do anything for checking observations already pushed to the projects. Just be cautious before concluding anything if you find some irrelevant observations in these projects.

2 Likes

Journal post:
https://www.inaturalist.org/journal/jeanphilippeb/73398-draft-for-creating-projects-for-unknown-observations


Entry point for the new Fabaceae project and sub-projects:
https://www.inaturalist.org/projects/unknown-fabaceae

Sub-projects already populated with some observations:
https://www.inaturalist.org/projects/unknown-fabeae
https://www.inaturalist.org/projects/unknown-trifolieae

2 Likes

New subprojects for Lepidoptera.

The project Lepidoptera has been deleted and recreated, in order to be populated again, but only with the observations that will not go to the new subprojects.

In other words, this action is equivalent to a reset of the statistics and to the removal from the project of all observations that have been identified. This will result in a better sort of all observations still to identify. It happens only because the first Lepidoptera project was created as an experiment, without regard to the future plan. This should not happen again.

Links to project and subprojects:

Order Lepidoptera
     Superfamily Bombycoidea
     Superfamily Noctuoidea
     Superfamily Papilionoidea
               Subfamily Papilioninae
          Family Nymphalidae
               Subfamily Nymphalinae

In the description of the project Lepidoptera, there is an “umbrella” link for identifying all observations at a time (in project and all sub-projects recursively). No need to browse the subprojects one by one.

2 Likes

Main project has more observations of plants than butterflies (in verifiable).

Not surprising that observations that are “difficult” for the computer vision go to a higher rank project (as a consequence of more scattered c.v. suggestions, I guess). But I don’t see that many, what is the URL that shows more observations of plants than lepidoptera?

The projects were still being populated, maybe the ratio was different a few hours ago.

It was when there were about 10 pages less, the most current one is filled with moths.
They should stay in the project, right? So, I ided about two pages of them before.

Yes, if you have just identified them (when they were in the new project/subprojects).

Observations identified more than one day ago (when they were in the previous project) will not be pushed again to these project/subprojects.

New project and subprojects for observations of Arthropoda without identification.

Links to all projects at the top of this journal post.

1 Like

I think it just worth pointing out that jeanphilippeb routine for flagging unidentified observations is superb and is about 98% correct for Proteaceae (I cannot assess false negatives). It has rescued hundreds of observations from obscurity (https://www.inaturalist.org/observations?project_id=152984&place_id=any&verifiable=any&captive=any).

It would be great if the routine would review all observations above Family in Plants.

3 Likes

Apologies for a batch of - Not an Unknown protea - notifications. About 50, didn’t count.

I tidied up the very few, where I could.

I agree that jeanphillipeb’s projects are very helpful, although I’m not seeing quite as high a rate of correct IDs. But many of the incorrect ones are observations where the observer submitted several organisms as one observation, or there’s an insect or spider lurking amongst the plants, or various other similar situations. Also, it made it easy to pull out the observations of humans and whisk them off into Casual-dom!

1 Like

Thank you all for your feedbacks!

In 2 days I will have finished to treat all “unknowns” that “needs id” and that are older than one month (to let time for an ID from the observer).

If you are curious about how many observations have been treated so far, check these umbrella projects : Taxa A-I, Taxa J-S, Taxa T-Z.

This is not out of reach. It represents 2,500,000 observations to treat:
https://www.inaturalist.org/observations?quality_grade=needs_id&lrank=epifamily&taxon_id=47126

I can start with Southern Africa, representing 50,000 observations to treat:
https://www.inaturalist.org/observations?quality_grade=needs_id&lrank=epifamily&taxon_id=47126&place_id=7105,8489,7140,6986

As many high rank observations have already been treated in Southern Africa (in search for candidate observations for the trees projects 1 and 2), only 9,500 observations are still to treat (to be completed in 2 days) for S.A. :
https://www.inaturalist.org/observations?quality_grade=needs_id&lrank=epifamily&taxon_id=47126&place_id=7105,8489,7140,6986&not_in_project=156653,156655,156657,154158,154159,153322,154354,153984,154232,154233,154271,155743


My next technical objective is to be able to propose the software for being run by several persons, in order to push collaboratively the observations to the projects. I have been working actively on this for several weeks. The question is: who is going to run the software several hours per day, to copy/paste a new API token every morning?

At the present time, the software has no more local observations cache. This was an important technical step. Everything is now managed via iNat server, using the phylogenic projects and several exclusion list projects, so that all observations already treated in one way or another can be filtered out by iNat server itself.

I have to ensure that the sofware becomes robust enough before asking for a collaborative work, an important perspective for the 2,500,000 plants observations to treat, as well as for a bioblitz if you expect a quick treatment. The latest issue I had to manage was related to the fact that an API search for observations also returns observations from accounts that have been suspended.


The only manual action I do is adding users to the “opt-out” and the “does not allow other people to add their observations to projects” collection projects. (The software writes in a log file the users to add to these projects). Even in case of a future collaborative work, it is enough if only one person takes care of that.

4 Likes

Most of these 9,500 observations are being pushed to high rank phylogenetic projects.

An example of a high rank IDed observation pushed to a low(er) rank phylogenetic project:
https://www.inaturalist.org/observations/144139027

Presently identified as Order Lamiales (Community Taxon), it has been pushed to the Family Pedaliaceae project. This looks good, as this observation has 2 IDs Genus Sesamum, and this genus is actually in the Family Pedaliaceae.


BTW, @dianastuder , this observation is also an example of a pre-maverick. I hypothesize that this suggestion from @tonyrebelo will help finding pre-maverick observations.

Toward a Pre-Maverick project? With an URL for identifying the pre-maverick observations in the project, filtering out observations that are have become “Research Grade”.

It would be simpler (again) if iNat could provide a search for pre-maverik observations (observations that can, with one more ID, become Maverick). New feature request (impacting the database)? Who will use it?

3 Likes

People who are checking and resolving their mavericks already.
I certainly found my first 142 pre-mavericks interesting to work thru.

Glad to be able to pre-empt unnecessary work for identifiers. Where iNat is presenting my (old) ID as Diana disagrees with all the (newer) IDs, when I haven’t even SEEN them yet.

1 Like

I like this project a lot and I am curious how you got the API to find those observations.

1 Like