I periodically go thru the Unknown taxon to put obvious things where knowing eyes can see them. Some of them just can’t be IDd because of poor photos or you just can’t see what they think is the organism. Others are observations including photos of several organism that need to be split, so they can’t be identified either. Many of them are from people who started years ago and then became inactive.
@jbecky I think there is, sorta: There’s a checkbox at the bottom of the page where you can specify “this observation cannot be further improved”, or some such. I’m … not actually sure what this does, but I assume it takes it out of rotation from needs-ID pools.
I know if two people have marked it as life, it does change it to casual, so it’s out of the ID pool. Not sure in other cases. I try to name in down to the lowest common denominator that I know, note as a comment the photos are of different subjects, and if I’m the second are later to do so, flag it as good as can be.
Wow - that actually seems kind of undesirable, since certain types of difficult identifications (among algae, or between fungi or slimes, or of microbes) are easiest to find bc they are marked “life”.
They are marked as “unknown”, not “life”. They become “life” after someone marks it as algae, then someone marks it as red algae. And while it is true someone could mark it as life, no one seems to do so. It is amazing the amount of obvious insect, plant, arthropods, etc. that are lingering at “unknown”, let alone the things that can only be determined to be life.
There are a lot of observations with no ID out there. It takes a lot of human work to ID all of those to just kingdom or phylum so that the experts can find the observations and ID them further. I suggest a bot that IDs these unreviewed observations. It could be the same account that automatically marks some species (like Picea pungens) as not-wild. This bot would find observations that have gone one month or more without being IDed, and would use computer vision to give them a kingdom level ID. I think this would really help with getting unreviewed observations to identifiers. If kingdom isn’t enough, it could go to phylum or class, but that gets risky. Also, as to not cause problems with the computer vision training, an observation IDed only by this bot would not be used to train the computer vision, because that would be too recursive. Would this put too much effort on the servers, or would it be too risky to use the computer vision that much?
@mws My apologies, just after approving your feature request, I realized there is already an existing feature request for essentially the same thing. So I have merged it here.
I think this would result in way more State of Matter Life observations than there already are, unless the bot could also read the observer’s description and/or any placeholder to know whether it is the plant or the butterfly or the cat in the photo that is the subject.
It could be made so that the bot’s ID is automatically retracted when a real person IDs the observation, so that errors in the computer vision that might cause bad photos to be mis-IDed does’t result in state of matter life observations.
That could help keep it out of State of Matter Life. What would happen to any placeholder that had been on the observation when it was Unknown, though? Usually human observers will try to preserve them in a comment.
Given the number of things in the “unknown” pile that are that way either because there are multiple photos attached of different organisms, or because it’s utterly unclear what the photographer was centered on, I think a fully automated approach is not a good way to go.
It also seems a bit perilous to add ML identification access to the Identify tool–it has the potential to massively exacerbate our recently discussed problems with careless ID. What if Identify had a button that let you add the ML identification, but only to, say, Kingdom or Phylum level? That would still go faster than manually typing it out, but would keep a human in the loop to skip over fundamentally ambiguous observations.
My problem with this is that it would still take probably thousands of human hours just to do that. It would be a quicker, but still fairly unfeasible way to clear out unknown observations
Observations with multiple photos could have the AI go through each photo individually. Without consensus, it could then just not ID the observation, or ID as “stateofmatter life” to show that there’s an issue. It could also do something similar for a subject-less observation, where it doesn’t leave an ID if it can’t come to a strong conclusion on any kingdom.
Full disclosure, I didn’t read every post here, there seems to be a lot of this type of discussion lately.
However, it looks like many people are discussing the details of how to implement such a feature, or whether it’s possible.
I question the why. The mere existence of unidentified observations isn’t as much of a problem that needs to be addressed by automation. It is much more a low-barrier-of-entry opportunity for many would-be indentifiers to become engaged. Very gentle first steps on an identifier’s learning curve.
With the introduction of ML/AI/CV, identification by humans who are not experts is becoming increasingly obsolete. The human’s role is more and more reduced to making judgement calls in ambiguous cases, but even that will fade over time. Automatically identifying left-over unidentified observations would only increase the oft-lamented barrier for beginners, “how could I possibly help, I’m not an expert.”
Of course subject experts remain valuable for identification. I have this nagging thought, though, that today’s experts did not acquire their expertise with the help of AI/etc.
We are focused here on old observations that have never been identified and sunk to the bottom of the pile. I have spent a lot of time adding coarse ids to these in India especially. There aren’t hordes of people sorting them out, just a few that I’ve noticed with gratitude. & the reason for wanting them brought out of Unknown is both for the sake of the neglected observers and the data itself. There were some really excellent Indian plant observations stuck in Unknown for years.