Somehow, this is all about “metadata”. Even putting an observation into a projet is somehow attaching a “metadata” to the observation, meaning “present in this project”.
The first question about “trees” is: how to create the metadata “tree”?
An observation could be flagged as “tree” by us, just as we flag observations in the “Data Quality Assessment” panel or just as we flag an observation of ants as “Gyne(s) present?”/“Yes/No” in the “Observation Fields”.
An observation could be flagged as “tree” by iNat itself, running a separate computer vision algorithm designed to answer yes/no to the question.
No more need to put all “trees” in a dedicated project (this is just what we would do because the feature we need is missing, but it is not the best answer to the need “surfacing unknown trees”).
The second question about “trees” is: what to do with the metadata “tree”, how to display it?
The most simple seems to have a filter “tree” available in URLs and in the “Identify” filters panel. Then, we do whatever we want.
i sort of doubt that if you provide the data this way that that many more people will utilize it. part of the issue, i think, is that there’s not a great way to publicize the new “functionality”. part of the issue, i think, is just that not a lot of people in the grand scheme of things are active identifiers.
i think it’s fine to show how CV classification of unknowns functionality could work, just as a proof of concept, but personally, i don’t think it’s a great idea to keep going down this path by creating a lot more projects and adding a lot more observations to these projects. (the way you have to go about adding observations to these projects is just relatively inefficient. it would be better if iNat actually did this as a system change.)
Yes, but there are people dedicated to actively reduce the amount of observations not identified, or identified as “State of matter Life”. The need exists.
BTW, one should include also in the scope all observations identified as “Plant” or “Animal”, if not flagged as “Based on the evidence, can the Community Taxon still be confirmed or improved?” / “No”. But no need to detail all use cases, because once we have a filter based on computer vision, it will be easy to combine it with other filters (“unknown”, “plant”, “animal”, “can be improved”, etc) and do whatever we wish.
I agree, this way is relatively inefficient: 1 API request for adding 1 observation into a project. But if it means that the observation will be identified by someone within hours (as it happened with the aphids project) or within days, then 1 API request for getting 1 observation identified, it’s worth it!
Im one of the few people that occasionally actually use that tool. For example it shows me observations that it thinks could be Crassulaceae which i then manually ID to species. So i dont think that counts as automated. Its always been time consuming and probably resource intense to feed that tool, so i like the idea of only one person running it, building a database, and creating projects for specialists.
I understand this thread that the problem is less of “observations getting stuck at ‘unknown’”, and more of “observations getting stuck at higher taxonomy levels”. I also sympathise with the concern regarding how an AI feature’s ID would be accounted for in the Community ID, as well as perceived by human users.
To address these concerns, I’m proposing the following:
When sufficiently confident, the feature should apply a “proxy ID” to observations that are stuck (“stuck” defined by x amount of time inactivity) to observations with high taxonomy IDs.
This “proxy ID” should not be shown anywhere on the UI and should not be considered as part of the Community ID.
This “proxy ID” should only trigger the observation to show up in https://www.inaturalist.org/observations/identify when the relevant filters are applied. For example, given an observation that’s stuck at the ‘plants’ level, if the feature is confident that the observation is a ‘dicot’, the observation should show up if the user is filtering for dicots.
The expected impact of this feature could be that more “stuck” observations would be confidently pushed to lower taxonomic levels where (I assume) more identifier users are available.
One draw back: in some false positive edge cases, users could be confused by “why a certain observation ended up in their filtered list”. To mitigate this:
there could be an indication of the feature at action on the ID interface
users could opt out of this feature, using the filter UI.
I understood this to be a demonstrative “proof of concept.” While it’s clunky to add unknown observations to a project, it does speed up identification of unknowns to have likely members of a taxa grouped together. So, on the whole, I think this shows that it could be helpful to incorporate something like the requested functionality into the system.
Optimistic to presume that moving an honest Unknown to Plantae and on to Dicot - will improve its ID in future.
I have worked thru 2.5K Cape Peninsula Plantae down to Epifamily - if I can at least get them to where taxon specialists can filter for plant family - then those obs can hope for a better ID. We need more identifiers, and we need observers to be chivied to identify their own and a few more.
Your suggestion is interesting. I understand it as a smooth invitation to use the feature requested.
I think there should be also a direct access to the feature, for people wishing the treat directly the “observations getting stuck at ‘unknown’” (or ‘State of matter Life’), which has been also a concern for a long time.
Beware that mixing a new hidden proxy ID filter with the present community ID filter may not help promoting old observations, simply because the results are always displayed in reverse chronological order.
The suggestions are compatible, no need to exclude a particular usage.
BTW, I would like to express that we shall not focus “too much” about the particular usage that every one may have in mind. The result would possibly be that we will never get the requested feature. Better focus on one “universal” feature request and let’s all vote for it, in order to get it. When it is available, there will still be plenty of time left for feedbacks and improvements.
I also check Senna observations already identified at the species level, in search for wrong identifications (maybe 5% or 10% have wrong species IDs). BTW, the feature requested could also help finding such observations with wrong IDs (not too difficult for computer vision to “count” the leaflets of a leaf?).
On the contrary, I spend much less time with “Plantae” or “Fabaceae” observations, because viewing observations that I can’t identify is soon boring. When I see tendrils, I identify as Tribe Fabeae or at the genus level (Vicia, Lathyrus, …). There is one person in particular that refines later these identifications.
Fortunately, some other people have a completely different and complementary approach!
In that case, we are stuck at the Subfamily Caesalpinioideae, or at the family level if we are more cautious (possible confusion with Subfamily Mimosoideae), or even at the class level (possible confusion with Family Bignoniaceae).
They’re not ided mostly because of lack of hands or will, than not accessibility, there’re tens of thousands of them waiting to be ided in Insecta. I support machine adding broad ids to unknowns, but if you want to sort unknowns, making cultivated plants to not show with all the other observations would be very useful, more than separating aphids or butterflies.
It’s a matter of choice, not to identify cultivated plants. (Personaly, I don’t make this distinction and I am interested in all observations, see here for instance).
You can filter them out, provided they are marked as cultivated, whatever the other parameters of the search query are.
I guess you know that already, so the other answer is :
I have no way to find out unmarked cultivated plant, because this is not proposed by the computer vision provided by the iNat server.
There is an automatic algorithm that marks plants captive if they are IDd to a genus or species which is mostly captive in the local county. So somewhere in iNat’s programming they must have lists of what taxa are typically captive in each place, and if you found the list you could pull those out of the unknowns into their own project. I don’t know if it is possible to get the list.
I am convinced it’s a good idea to get it, if you would like to use it.
It is compatible with my previous suggestion of several projects, because:
The cost (API requests) to spread 100.000 observations over 50 projects is exactly the same as for putting 100.000 observations into 1 project.
With 50 projects containing “unknown” observations of plants, you could merge their contents simply by using the adequate URL (example below), if you prefer.
On the contrary, if there is only 1 projet with 100.000 “unknown” plants observations, it is impossible to use it for reviewing only the Fabaceae in it. Populating later a Fabaceae project with 5.000 of these observations would require additional resources and delay other activities of the same iNat account.
I had a look in the API description of the taxon Ginkgo biloba in search for indication of being mostly cultivated, but I didn’t see anything (yet, at the bottom of the response, there are many indications of locations where it has been introduced). I also looked at the API description of this observation, and didn’t see anything either.
Do you have to/will you want to remove observations from the projects after they are identified? Do you have a way to that as a batch edit?
I wonder if these projects will largely go unnoticed, or if many people will find them and participate. I wonder if people who specialize in a particular group would like to be informed of the corresponding projects (maybe tag them in a journal post?) or if they’d rather not be bothered.