Automatic iNat suggestion for "unknown" observations that reach a certain age

jeanphilippeb · December 4, 2022, 2:43pm

Somehow, this is all about “metadata”. Even putting an observation into a projet is somehow attaching a “metadata” to the observation, meaning “present in this project”.

The first question about “trees” is: how to create the metadata “tree”?

An observation could be flagged as “tree” by us, just as we flag observations in the “Data Quality Assessment” panel or just as we flag an observation of ants as “Gyne(s) present?”/“Yes/No” in the “Observation Fields”.
An observation could be flagged as “tree” by iNat itself, running a separate computer vision algorithm designed to answer yes/no to the question.
No more need to put all “trees” in a dedicated project (this is just what we would do because the feature we need is missing, but it is not the best answer to the need “surfacing unknown trees”).

The second question about “trees” is: what to do with the metadata “tree”, how to display it?

The most simple seems to have a filter “tree” available in URLs and in the “Identify” filters panel. Then, we do whatever we want.

Let’s open a feature request and vote for it.

pisum · December 4, 2022, 3:09pm

i sort of doubt that if you provide the data this way that that many more people will utilize it. part of the issue, i think, is that there’s not a great way to publicize the new “functionality”. part of the issue, i think, is just that not a lot of people in the grand scheme of things are active identifiers.

i think it’s fine to show how CV classification of unknowns functionality could work, just as a proof of concept, but personally, i don’t think it’s a great idea to keep going down this path by creating a lot more projects and adding a lot more observations to these projects. (the way you have to go about adding observations to these projects is just relatively inefficient. it would be better if iNat actually did this as a system change.)

arboretum_amy · December 4, 2022, 4:33pm

I liked the idea, but I didn’t try it because it seemed like a possible way to get myself banned; staff has shown distrust of anything automated.

jeanphilippeb · December 4, 2022, 4:34pm

Yes, but there are people dedicated to actively reduce the amount of observations not identified, or identified as “State of matter Life”. The need exists.

It reminds me this discussion, among others :
Amount of “Unknown” records is decreasing

BTW, one should include also in the scope all observations identified as “Plant” or “Animal”, if not flagged as “Based on the evidence, can the Community Taxon still be confirmed or improved?” / “No”. But no need to detail all use cases, because once we have a filter based on computer vision, it will be easy to combine it with other filters (“unknown”, “plant”, “animal”, “can be improved”, etc) and do whatever we wish.

jeanphilippeb · December 4, 2022, 5:12pm

I agree, this way is relatively inefficient: 1 API request for adding 1 observation into a project. But if it means that the observation will be identified by someone within hours (as it happened with the aphids project) or within days, then 1 API request for getting 1 observation identified, it’s worth it!

jf920 · December 4, 2022, 6:03pm

Im one of the few people that occasionally actually use that tool. For example it shows me observations that it thinks could be Crassulaceae which i then manually ID to species. So i dont think that counts as automated. Its always been time consuming and probably resource intense to feed that tool, so i like the idea of only one person running it, building a database, and creating projects for specialists.

stekkelpak · December 4, 2022, 6:42pm

I understand this thread that the problem is less of “observations getting stuck at ‘unknown’”, and more of “observations getting stuck at higher taxonomy levels”. I also sympathise with the concern regarding how an AI feature’s ID would be accounted for in the Community ID, as well as perceived by human users.

To address these concerns, I’m proposing the following:

When sufficiently confident, the feature should apply a “proxy ID” to observations that are stuck (“stuck” defined by x amount of time inactivity) to observations with high taxonomy IDs.
This “proxy ID” should not be shown anywhere on the UI and should not be considered as part of the Community ID.
This “proxy ID” should only trigger the observation to show up in https://www.inaturalist.org/observations/identify when the relevant filters are applied. For example, given an observation that’s stuck at the ‘plants’ level, if the feature is confident that the observation is a ‘dicot’, the observation should show up if the user is filtering for dicots.

The expected impact of this feature could be that more “stuck” observations would be confidently pushed to lower taxonomic levels where (I assume) more identifier users are available.

One draw back: in some false positive edge cases, users could be confused by “why a certain observation ended up in their filtered list”. To mitigate this:

there could be an indication of the feature at action on the ID interface
users could opt out of this feature, using the filter UI.

wthompson23 · December 4, 2022, 7:00pm

I understood this to be a demonstrative “proof of concept.” While it’s clunky to add unknown observations to a project, it does speed up identification of unknowns to have likely members of a taxa grouped together. So, on the whole, I think this shows that it could be helpful to incorporate something like the requested functionality into the system.

DianaStuder · December 4, 2022, 8:19pm

Optimistic to presume that moving an honest Unknown to Plantae and on to Dicot - will improve its ID in future.
I have worked thru 2.5K Cape Peninsula Plantae down to Epifamily - if I can at least get them to where taxon specialists can filter for plant family - then those obs can hope for a better ID. We need more identifiers, and we need observers to be chivied to identify their own and a few more.

Basically 2.5 MILLION waiting across the world
https://www.inaturalist.org/observations/identify?per_page=10&iconic_taxa=Plantae&order_by=observed_on&taxon_id=47126&lrank=epifamily&place_id=any

jeanphilippeb · December 4, 2022, 8:27pm

Your suggestion is interesting. I understand it as a smooth invitation to use the feature requested.

I think there should be also a direct access to the feature, for people wishing the treat directly the “observations getting stuck at ‘unknown’” (or ‘State of matter Life’), which has been also a concern for a long time.

Beware that mixing a new hidden proxy ID filter with the present community ID filter may not help promoting old observations, simply because the results are always displayed in reverse chronological order.

The suggestions are compatible, no need to exclude a particular usage.

BTW, I would like to express that we shall not focus “too much” about the particular usage that every one may have in mind. The result would possibly be that we will never get the requested feature. Better focus on one “universal” feature request and let’s all vote for it, in order to get it. When it is available, there will still be plenty of time left for feedbacks and improvements.

jeanphilippeb · December 4, 2022, 8:42pm

I use this link whenever I have time for identifications:
Caesalpinioideae observations identified as Caesalpinioideae (at the subfamily level)
https://www.inaturalist.org/observations/identify?quality_grade=needs_id%2Ccasual%2Cresearch&taxon_id=324170&lrank=subfamily

I try to ID these observations at the species (or subsp.) level, or genus level when it seems impossible to do better :

I also use this link when I wish to make many identifications in a short time:
Senna observations identified as Senna (at the genus level)
https://www.inaturalist.org/observations/identify?quality_grade=needs_id%2Ccasual%2Cresearch&taxon_id=52348&lrank=genus

I also check Senna observations already identified at the species level, in search for wrong identifications (maybe 5% or 10% have wrong species IDs). BTW, the feature requested could also help finding such observations with wrong IDs (not too difficult for computer vision to “count” the leaflets of a leaf?).

On the contrary, I spend much less time with “Plantae” or “Fabaceae” observations, because viewing observations that I can’t identify is soon boring. When I see tendrils, I identify as Tribe Fabeae or at the genus level (Vicia, Lathyrus, …). There is one person in particular that refines later these identifications.

Fortunately, some other people have a completely different and complementary approach!

jeanphilippeb · December 4, 2022, 9:07pm

I do it either.
I don’t think it’s a bad idea.

Try to distinguish these species when there is only one photo of the leaves, taken at a long distance:
https://www.inaturalist.org/taxa/62839-Delonix-regia/browse_photos
https://www.inaturalist.org/taxa/602044-Cenostigma/browse_photos
https://www.inaturalist.org/taxa/133611-Peltophorum/browse_photos

In that case, we are stuck at the Subfamily Caesalpinioideae, or at the family level if we are more cautious (possible confusion with Subfamily Mimosoideae), or even at the class level (possible confusion with Family Bignoniaceae).

fffffffff · December 5, 2022, 12:28am

They’re not ided mostly because of lack of hands or will, than not accessibility, there’re tens of thousands of them waiting to be ided in Insecta. I support machine adding broad ids to unknowns, but if you want to sort unknowns, making cultivated plants to not show with all the other observations would be very useful, more than separating aphids or butterflies.

jeanphilippeb · December 5, 2022, 5:26am

It’s a matter of choice, not to identify cultivated plants. (Personaly, I don’t make this distinction and I am interested in all observations, see here for instance).

You can filter them out, provided they are marked as cultivated, whatever the other parameters of the search query are.

I guess you know that already, so the other answer is :
I have no way to find out unmarked cultivated plant, because this is not proposed by the computer vision provided by the iNat server.

fffffffff · December 5, 2022, 6:36am

Just filtering plants out would be enough.

arboretum_amy · December 5, 2022, 8:39pm

There is an automatic algorithm that marks plants captive if they are IDd to a genus or species which is mostly captive in the local county. So somewhere in iNat’s programming they must have lists of what taxa are typically captive in each place, and if you found the list you could pull those out of the unknowns into their own project. I don’t know if it is possible to get the list.

jeanphilippeb · December 6, 2022, 7:30pm

I am convinced it’s a good idea to get it, if you would like to use it.

It is compatible with my previous suggestion of several projects, because:

The cost (API requests) to spread 100.000 observations over 50 projects is exactly the same as for putting 100.000 observations into 1 project.
With 50 projects containing “unknown” observations of plants, you could merge their contents simply by using the adequate URL (example below), if you prefer.
On the contrary, if there is only 1 projet with 100.000 “unknown” plants observations, it is impossible to use it for reviewing only the Fabaceae in it. Populating later a Fabaceae project with 5.000 of these observations would require additional resources and delay other activities of the same iNat account.

Just for illustrating, assuming you would like to identify Lepidoptera and Crassulaceae together, as if they were in a single project, you would use this URL :
https://www.inaturalist.org/observations/identify?quality_grade=casual%2Cneeds_id&verifiable=any&identified=false&project_id=152106,152123

A good choice of the projects is important, from the start. More projects is better, as long as they don’t contain too few observations.

jeanphilippeb · December 6, 2022, 7:56pm

I had a look in the API description of the taxon Ginkgo biloba in search for indication of being mostly cultivated, but I didn’t see anything (yet, at the bottom of the response, there are many indications of locations where it has been introduced). I also looked at the API description of this observation, and didn’t see anything either.

If someone else would like to see, copy/paste one of these API URLs
https://api.inaturalist.org/v1/taxa/64350
https://api.inaturalist.org/v1/observations/46357579

in this JSON Formatter tool:
https://jsonformatter.curiousconcept.com/

jeanphilippeb · December 13, 2022, 10:56pm

Continuing from the suggestion above, I would like to share my intention to create more projects for grouping the “unknown” observations. I spent my whole week-end on developing the program to automate the choice of the projects to be created. The result is presented in this journal post:
https://www.inaturalist.org/journal/jeanphilippeb/73398-draft-for-creating-projects-for-unknown-observations

arboretum_amy · December 14, 2022, 3:56am

Do you have to/will you want to remove observations from the projects after they are identified? Do you have a way to that as a batch edit?

I wonder if these projects will largely go unnoticed, or if many people will find them and participate. I wonder if people who specialize in a particular group would like to be informed of the corresponding projects (maybe tag them in a journal post?) or if they’d rather not be bothered.

Topic		Replies	Views
Automatic computer vision IDs? General	7	1396	September 24, 2021
iNaturalist backlogs and trends? General question	5	1106	September 23, 2019
Automatically Suggest ID General	4	303	September 23, 2021
An addition to 'Automatic iNat suggestion for “unknown” observations that reach a certain age' General	7	1228	September 12, 2019
AI-assisted occurrence searches General	23	2774	April 1, 2021

Automatic iNat suggestion for "unknown" observations that reach a certain age

Related topics