“Unknown (Family)” projects

Today, while I review “unknowns” for broad ID, I see some of these Unknowns have been added to Projects for Unknowns at very high levels, Life, Carnivore, Arthropod:


How does that happen?

I assume an automatic process is sorting and adding to projects? I would think if a person added it to a project, and knowns it is an Arthropod, they would add Arthropod as the ID rather than adding it to a project.

How are these unknowns getting sorted into high-level groups?

Maybe curators for that project are adding tons of unknowns.

I would guess it relates to this:
https://www.inaturalist.org/journal/jeanphilippeb/73398-draft-for-creating-projects-for-unknown-observations

1 Like

It’s @jeanphilippeb

1 Like

Yes, that’s it!

Observations are put in a project at an high taxonomical rank when the computer vision does not provide enough confidence to put it at a lower rank.


The next question could be: why does my software put so many observations (possibly of lower interest, if nothing easily identifiable is shown) in a few high rank projects?

By putting observations in a project during a run of the software, it becomes easy to filter them out on the server side during the next run, so that they are never downloaded again. And, in a near future, I would like that this software has no more local cache on the disk (a cache of all observations downloaded). The idea is to put every “unknown” observation (or at least those that “Needs ID”) to one project and then to ignore it, for sparing ressources. The purpose is to be able to deliver the software to other persons that would accept to contribute to populating the projects, collaboratively. Only the iNat server would know which observations have been treated already and need not be downloaded and analysed again by the software, whoever has treated these observations and whoever is presently running the software.


For exactly the same reason (avoiding to treat twice the same observation, even when using the software collaboratively) I also use “exclusion projects” for observations that are not put in a “phylogenetic project”:
https://www.inaturalist.org/projects/exclusion-list-for-phylogenetic-projects


Once I have finished to put in these projects all observations I have in a local cache (containing ID of observation + 10 computer vision suggestions / observation), I will just delete this cache.

Initially this sofware was designed for downloading, storing and identifying the observations without ID (as presented here and here). Presently it also puts observations in projects. Soon it will only download observations, analyze them and put them in projects, then forget about them.

2 Likes

According to these statistics, there are presently about 350,000 observations without ID that “Needs ID”:
https://forum.inaturalist.org/t/does-anyone-else-get-bothered-by-how-many-observations-are-marked-as-unknown-species/31896/236

I can see your Yellow Flag projects trickling thru as I ID.
A spectacular achievement from you!!

1 Like

@teellbee you can also go to his journal post … and pick a taxon level of your choice to play with.

Some data have been downloaded 1 or 2 years ago and the results may be unequal, as the computer vision has progressed.

It seems also to work better for some taxa than others.

So, the Computer vision is used to sort them into your projects, I take it?

What happens to the project data when a person like me* goes through the Unknowns and IDs something?

Observations just remain in the project.

You may then use the statistics provided by the project and see the species and the identifiers.

1 Like

BTW, the same software has populated these non-phylogenetic projects (see their descriptions for details):
https://www.inaturalist.org/projects/unknown-trees-of-southern-africa-1
https://www.inaturalist.org/projects/unknown-trees-of-southern-africa-2

This has required to compare the computer vision data with a list of taxa known to be all trees.

1 Like

So, if I am IDing unknowns and add and ID of, say, Squirrels (Sciuridae), it would remain in the UNKNOWN Animalia or project? Even though it is no longer UNKNOWN?

Yes.

But if you use some of the links provided in a project description, you can filter out the observations already identified. (I also included reviewed=false in the links, to help you skip some observations).

For instance, see this project, its statistics (353 observations, 200 already identified):

https://www.inaturalist.org/projects/unknown-caesalpinioideae

and the links in its description:

Subfamily Caesalpinioideae observations without id. (or high rank id.).

Identify observations without id.

Identify observations with high rank id.

Parent projects: Fabaceae, Fabales, Magnoliopsida, Tracheophyta, Plantae, Life.

1 Like

I think the usage of a prefix "Unknown / " and of a common yellow color helps to understand quickly that all these projects are related. Of course unknown refers only to the condition for adding observations to the projects, not to their future identifications.

1 Like

Then, overtime, these become projects of mixed IDed and Unknown observations?

And, interested users will add an additional screening mechanism?

Yes, and a few projects have already more identified observations than unidentified, for instance:
https://www.inaturalist.org/projects/unknown-proteaceae
https://www.inaturalist.org/projects/unknown-aphididae (the 1st project created, and not at all the easiest for computer vision)

Providing evidence that these projects help identifying (if we can show that a few identifiers are very active on a project or another) would also be in favor of a new iNat feature: searching for observations based on the c.v. suggestions. These projects are a palliative to the absence of this feature.

Adding observations to projects has a cost (for the server), but browsing unknown observations that do not match your taxa of interest and that you won’t identify also has a cost.

Personnaly, as an identifier, I consider doing my best for identifying an observation without ID more interesting than reviewing observations already with IDs, trying to improve (or correct) the IDs. Fortunately, different iNatters have different and complementary preferences and behaviors, with regard to identifying observations.

2 Likes

Okay, thank you for that nice explanation. I could see how such a feature could really accelerate identifications. I’ve been IDing Unknowns for a couple months now and it is rather tedious. I often include auto text notes to advise new users on more efficient ways; e.g., I type my trigger text and the following suggestions get auto added to the comment.

Trigger text, Unk (Unknown organism) produces:
As this Observation was entered as Unknown, it may not get reviewed by experts. It helps to add even a very high level ID when you enter your observation. I am identifying this very generally in the hope that it will be noticed and identified by someone with more expertise.

Or,
Trigger text, Cv (for using the Computer Vision) produces:
I’m not an expert, but this was Suggested by iNaturalist Computer Vision. Did you know if you click in the Species Name box (it’s under the Suggest an Identification tab) when adding your Observation that iNaturlist software will suggest likely species? It’s not always right, but it is improving all the time. The Compare button may offer similar organisms to consider.

Trigger text, Multi (Multiple organisms) produces:
Your observation includes photos of multiple species. Could you add them as separate observations? If you do that, they’ll may all get IDed. A quick way to fix this observation is to use the duplicate feature. In the upper right corner of the observation page, click the downward arrow next to “Edit” and choose “Duplicate.” Then identify the duplicate observation as the organism in your second picture and uncheck the checkboxes next to the other pictures. You can repeat this process to create new duplicate observations for picture #3, #4, etc. Lastly, come back to the first observation, click “Edit,” and delete the extra picture.

Trigger text, Picturethis (Captive/Cultivated plants) produces:
You may also enjoy using an app called Picture This, which is designed to ID landscaping plants.

Down the road, I could see such automatic educational feedback being a useful part of the development.

Question: are these projects easier than using the exact_taxon_id filter, or is there some other reason why they were needed?

That won’t work for Unknowns, as they don’t have any taxon to filter on.

I like the Unknown projects. It is a nice change from looking at blurry plant photos which is what a lot of unknowns are. When I am sick of general unknowns, I can look at Unknown Arthropods, and sort some of them into spider, myriapod, butterfly, etc. And some are still plants, but then it is interesting to think about why the CV thought they were arthropods.

3 Likes