Does anyone else get bothered by how many observations are marked as "unknown species"?

A couple of things to keep in mind:

  1. It takes more work to move to a broader ID than to a narrower ID, because of the “do you disagree?” dialogue box.

Personally, I’m usually going through observations identified as a particular genus or species. When I see misIDs I mostly just push them back up to the highest level that I believe is correct, even if they are taxa that I could identify. This is just more mentally efficient for me—if I’m paying attention to genus Alpha, I don’t want to be switching my focus to identifying members of genus Beta or genus Gamma every time those come up in the pile. One mental task at a time is plenty.

So, if your observation is seen by an identifier like me: The more precise ID does increase the likelihood that I will see it and put an ID on it, but I’m probably just going to kick it up to a higher ID, which doesn’t really help you. And if I’m encountering a large number of misIDs I’ll start getting irritated by the “do you disagree?” box.

  1. Misidentifications can propagate, both by way of the computer vision IDs and by way of other users IDing their observations by looking for other observations that look like the same taxon.

I think this is a larger problem at the species level. In some cases it can result in a feedback loop—as the number of observations of Alpha beta that are misidentified as Alpha delta increases, so does the probability that any new observation of Alpha beta will be misidentified as Alpha delta. It’s rare for this kind of feedback loop to really take off, but when it does it can be a real mess, creating hundreds or thousands of observations with the same misidentification.


That said, what level of certainty is appropriate before you make an identification is a judgment call that does not have any good, general-purpose answer. There’s a complicated relationship between expertise and uncertainty. A large part of developing expertise in a particular genus is learning which species are likely to be confused with each other, and in which contexts. Without that knowledge, estimating the certainty of an identification is difficult!

Regardless of his other attributes, Rumsfeld’s famous taxonomy of knowledge is helpful here. If you’re learning the genus Astragalus, a particular species like Astragalus emoryanus will first be an unknown unknown (you don’t even know it’s one of the possible identifications for your observation), then a known unknown (you know it’s a possibility but aren’t sure how to identify it), then a known known (you know it’s a possibility and you can identify it reliably). When I visit areas where I don’t know the flora well, a lot of my work in identification is converting the unknown unknowns into known unknowns—figuring out what the possibilities are. My error rates are highest when I jump to an ID based on apparent familiarity without first checking if there are some unknown unknowns that I ought to be worried about.

(Wandering further down this rabbit hole, suppose you’re in an area where there are two species of genus Alpha, and Alpha beta is twenty times more frequently observed than Alpha delta. For observers who haven’t heard of Alpha delta (it is an unknown unknown), the most common misID will be identifying Alpha delta as Alpha beta. For observers who have heard of Alpha delta (it is a known unknown or a known known), the most common misID will be identifying Alpha beta as Alpha delta—if you know there are two species but you’re only seeing one of them, the natural tendency for most people is to try to force the variation within that species to match up with the two options.)


The real killer is the less-often mentioned 4th possibility: unknown knowns – things you don’t know, but thought you did, but are known by someone who specializes in that taxon. Like there are other multiple species possibilities where you thought, based on out-of-date information, there was only one.


I have a different definition of unknown knowns, keeping the subject constant: Things that I don’t know that I know. (Or that you don’t know you know, &c.)

For instance, think about reading a novel written by someone who lives in a very different social or technological world than you. There’s a massive amount of information that the author assumes the reader will have. The full extent of that information only becomes apparent when it is absent. As the confused reader in this example, many of the author’s unknown knowns are experienced by you as known unknowns.


Yeah, I often considered the unknown knowns to be those things buried in my brain (maybe in my reptile brain) that I’m unaware of at a conscious level but are known to me at some deeper level. But that’s probably way off topic.

Then, it can be interesting down the line to see which of those formerly Unknowns move right along from a very-high-level ID to genus or even species. Sometimes, I am so tickled when I get notifications for something ID’d at very high level that has already been reviewed by others with real expertise. After languishing in Unknowns for months or more, suddenly an observation has a relatable ID - amazing!


I think we may have seen an example of this in the recently closed thread. the last post before it closed said

I was puzzled by this. I went ahead and bolded the “unknown known” – that is, instead of simply withdrawing the species level ID, I am confused as to why they didn’t revise it to a broader ID of Genus Amaranthus or Family Amaranthaceae (depending on which taxon they meant by “amaranth”). Then the observation would have only gone back to that level instead of to Unknown. But this may be a case of being unaware at a conscious level of something that is known to them at a deeper level.

And no, I don’t think that it is way off topic. It’s a likely explanation for things being marked as Unknown.

Withdrawing the ID is one click. :-) Entering the broader ID is more robust to a variety of future possibilities, but this may not be obvious at the time. Absent a reason to choose one option over the other, minimizing effort is reasonable enough.

There’s something similar going on, I think, with some previous discussions in which people were annoyed by others entering coarse IDs—something like, “Anyone can see it’s a rabbit, but why would you identify it as ‘rabbit’?” Something known but, I guess, considered so trivially obvious that putting it in the ID field was objectionable. People are strange.


This is not a case of an observation being entered as unknown and left as such by an observer who never returns to it. You are taking things out of context again.

Once I discovered that my observation had become “unknown” on account of the other user deleting their account, I did in fact reenter an ID. (Once again, please note that it was only unknown because I did not receive any notification that the other ID was now gone. I had no reason to expect that the ID would suddenly disappear. Anyone who had looked at the observation during this time would have seen my withdrawn ID and likely speculated that something odd was going on.)

Why didn’t I initially enter a new, broader ID in response to the disagreeing ID instead of just withdrawing mine? Frankly, I saw no need. The observation had an ID provided by the other user, which I suspected was correct but was not confident confirming. (Before you criticize me again for not following up and learning how to distinguish the two species: I decided that at present I was simply not practised enough to see the relevant distinctions for this genus and that my time would be better spent trying to master some other, slightly less intimidating taxon. So I left the observation with the other user’s ID, to be confirmed by others or to return to myself at some later date.)

That is what matters to me – that the observation is labelled, ideally as accurately as possible. This was the case. I felt no need to subsequently prove that I at least know the genus by putting this as an ID. Again, anyone who looks at an observation and sees the history of IDs and withdrawn IDs can reconstruct the process that went on (provided that part of this history doesn’t simply disappear without warning).

I have a number of older observations where I originally entered a relatively broad ID which I could now correctly identify to species level. In the meantime, they have reached “research grade”, sometimes with multiple confirming IDs. I suppose I could go back and add my own species ID now, but again, I don’t see any reason. I was where I was in the past; now I know more but I prefer to apply this acquired knowledge to future observations rather than adding another, unneeded agree to an existing one.


All I know is I’m pretty sure I know less now about all things that are knowable than I thought I knew decades ago. Knowing one’s own ignorance is a learning process.


Just to ensure that complaints about off-topicness have some justification, I ran across a nice real-world example of unknown knowns. I bought a laser rangefinder. The battery compartment has a little lid that unscrews, which the designers provided with an unlabelled directional arrow.

Does the arrow point in the direction to open the battery compartment, or the direction to close the battery compartment? The designers surely knew, but it did not occur to them that this was a thing they knew that other people might not know.

(It points in the direction to close the compartment, as it happens. I guess they’re not worried about how you’ll open it, but think you might forget which way it goes when you close it.)


I agree it is important. I do remove (disable) my wrong IDs.

This remark sounds very important and we might ask for a new feature: the top suggestion (displayed as “We’re pretty sure this is…”) should be a taxon covering almost all the species suggestions displayed. It would often be a high rank taxon, Order or Family, a name that more people would know, like butterflies.

I am not actually asking for removing the present top suggestion, but maybe adding one more on the top of it. The taxon suggested would also (partially?) satisfy the persons asking for automatic identifications, yet the ID would not be put, just strongly suggested:

And the icing on the cake would be to make this top ID suggestion searchable… see other discussions about a similar feature, for instance here.

In this example where species of different Superfamilies are displayed, yet a Genus is proposed as the top suggestion (which at first sight seems contradictory, isn’t it ?) (the reason is that the species suggested have extremely different confidence scores, but these scores are not displayed):

BTW, the computer vision has been much improved and is trained on a larger and larger set of species, so does it still make sense to display as suggestions several taxa that have a confidence score near zero?


We have a global issue with “complex feature requests”.

We have been discussing about interesting suggestions about the “unknowns” for years, and these suggestions remain either disconnected from each other (and supported by too few people) or conflicting with each other (alternative responses to the same question), so that no actual feature request emerges, for which we would vote, in order to get it realized.

The system is not designed to promote suggestions within a discussion thread (except this thread). It is designed to work with votes for a feature request. So, we need a “feature request” containing a complete and consistent response to several needs about the unknowns and about identifying. We should collect the use cases and think about them as a whole, and elaborate a solution. We lack a shared vision. Either we build it, or we won’t get anything realized.


I battle each time I ID a Psoralea.
The default suggestions leap confidently straight to (wrong) species, including some new species which haven’t even been formally described yet.
Try getting iNat to offer a single click for
Genus Psoralea

iNat won, Diana still fighting back. (And the taxonomists have joined the battle now …

Does anyone know what percentage of observations are uploaded as Unknowns? I’m assuming it’s probably only 1 or 2 percent, but I’m curious and I don’t know how to find out.

Can’t answer - but I think it has come up before in the forum.
@tiwane or @pisum can probably tell you.

Recent observations:
2742773 observations
100 %
92190 observations not identified
3.36 %
2650557 observations identified
96.64 %

Over a shorter period, same result:
1770788 observations
100 %
61028 observations not identified
3.45 %
1709770 observations identified
96.55 %

This comparison suggests that "unknown" observations get identified very slowly.
To date, the total is:
138831280 observations
100 %
3474064 observations not identified
2.50 %
135357199 observations identified
97.50 %


Does the 2.5% that are identified=false include Casual observations?

How many of the ‘identified’ have a very broad ID like Plantae?

identified=false and quality_grade=casual are independent filters, I think (technically, overlapping filters with different names, identified and quality_grade, wouldn’t make sense).