Automatic iNat suggestion for "unknown" observations that reach a certain age

matthias55 · June 17, 2019, 12:52pm

My request is to have the iNat algorithm generate an ID for observations marked “unknown” after a certain amount of time passes (could be a year, six months, or three months). This could be marked somehow as “iNat-generated ID” and would only to observations for which the iNat algorithm is “pretty sure” about. It would allow these unknowns to be searchable and findable for the whole community.

There is a huge backlog of “unknown” observations, often created by new users, and these observations are not that accessible for iNat taxonomic specialists. I get the sense that this backlog is increasing, though I don’t have data to back that up.

I spend a lot of time IDing these unknowns. For taxa I’m not familiar with, I find myself using the ID suggestions (especially if the photo is good and I think well-suited for computer vision). It would save me a lot of time if iNat would just do it automatically.

I see that one concern is that this would take a lot of computational time. Maybe it could be tested with really old observations (2+ years) to see if it is effective.

charlie · June 17, 2019, 12:53pm

I approved this because we are really backlogged on mod approvals since all of us aren’t really around right now. However, I think there is another topic where this idea is discussed. I just haven’t had time to find it. When we do we can merge it in.

I think this is an interesting idea and it has been discussed before. I don’t remember the results.

danaleeling · June 17, 2019, 1:17pm

Perhaps:
https://forum.inaturalist.org/t/automatic-computer-vision-ids/2074

charlie · June 17, 2019, 2:45pm

and that in turn links to https://forum.inaturalist.org/t/species-suggestions-for-the-wrong-continent/789

bouteloua · June 17, 2019, 3:14pm

Thanks, I commented on that General discussion directing them here, so interested parties can vote on this feature request if they so choose! :)

psyllidhipster · June 17, 2019, 3:20pm

I wouldn’t be against this as long as it provided only very coarse IDs to get obs out of unknown. Is it a plant? Insect? Bird? Fungus? Great! More specificity would probably just cause problems though.

DianaStuder · June 17, 2019, 6:42pm

Rough categories would help.
Also someone has industriously been thru all the bougainvilleas from Cape Town’s City Challenge with a kind copypasta - you cannot ID to species with this photo. So the wrong species ID was useful to him.

But iNat is pretty sure our arums are Arum, and it can take 5 votes for Zantedeschia to convince iNat - you are wrong despite being pretty sure.

clay_s · June 18, 2019, 4:44am

I think IDing down further would be more beneficial to getting things IDed to the best level. Wading through observations set to plant by a program, just to require grunt work of categorizing them further into Mosses, Liverworts, and Vascular Plants wouldn’t save much time, nor get very many more people involved at that stage of IDing

When I’m doing old unknowns, I open it, check if any notes indicate what the OP intended. Put it as far that direction as I can knowingly do so, copying and pasting the placeholder into a comment if it existed. If nothing, I decide if I have a clue, check reviewed if no clue, throw as far up the tree as I can otherwise. Vascular is further split into Conifers, Ferns, Monocots, and Dicots.

Having everything set to plant wouldn’t help me. I would still have to “review” all seaweeds(algae and kelps) and certain liverwort, fungi, and mosses. The higher up the tree it lands, the more likely it will pique someone’s interest into picking it up. Going from “Unknown” to “Plant” doesn’t pique anyone’s interest, it will still be the people that like contributing that also realize their limited knowledge is best utilized by doing gross sorting, rather than by studying something in depth at species level to be the 5th person to agree. :) I’ll spend that time trying to determine my observations and reviewing all the improvements and comments done to my gross sortings, to see what I can learn from them.

Note: This is from my perspective. Different people process these lingering observations both by vastly different methods and reasons. But people don’t work fungi, or plant, or whatever that is that general. Aves might be the exception, but it had 10 pages in my area. Other results:
Unknown 159, Plant 75, Fungi 85, Animal 80, Spiders 39, Insects 43,Protozoans 1/2(wow), Bacteria 5 observations(stunned silence), Arachnids 7, Mollusks 1/2, Ray-finned Fishes 3, Mammals 4, Reptiles 1/2, Amphibians 1/2.

From this I think it should be done to various levels depending on the type. Obviously (I think), it appears that top levels of the major subdivisions don’t get worked anymore than unknown, so it needs to be broken down further than that. To what level? What level do most people decide to work at? Or does a secondary aging time-stamp need to be used so that old stagnate things can be at the front of the queue every x years/months. People woulkd need to use the “good as it gets” more with that though.

kiwifergus · June 19, 2019, 1:11am

To go from unknown to something (as in plantae, fungi, etc) would still only be one ID, so it won’t be RG and would appear in needs id pool… And because of the age of the obs, it would likely be quite far back, so only “power IDers” are going to see it. I say go as specific as the CV/AI allows!

charlie · June 19, 2019, 3:41am

I’m more generalist than some I guess, but I do filter by “plants” all the time (and by a location). Sometimes I will specify vascular plants but usually I just mark the mosses I don’t know as reviewed and maybe that helps me slowly get a feel for what is around. I rarely include “something”s and almost always exclude animals and fungi. I am way more likely to see something marked as “plants” than “something”

tiwane · June 20, 2019, 7:48pm

Talking with our devs, something like this is technically possible, maybe running slow in the background and for much older observations. Whether or not it’s worth it, or would be a good idea is up for debate. A few issues I would have with it:

Philosophically, would I want my observation to automatically be categorized by an AI? I understand the potential utility, but is that what iNat is about? How would the ID be counted in the Community ID section?
Since Identify weights newly-added observations by default (although that can be changed in Filters), this may not help surface older observations very well.
I think preventing these from being posted is a better long-term solution, which means better onboarding. Perhaps a notice or two for new users if they try to post an observation without an ID. Or (maybe it’s just nostalgia) but I wouldn’t mind seeing the return of iconic taxa buttons when making an ID.

joe_fish · June 20, 2019, 8:09pm

I would strongly be in favor of this change. I sometimes click on the observation catalog of specific users, and I often see these “unknowns” mixed in. I can’t imagine how many useful observations are stuck in limbo like this. I’m also one of the “power IDers” who regularly goes through observations stuck at higher-level taxa, but I can’t get to an observation if it doesn’t show up there for me to look at. At the very least, these “unknowns” should be lumped together to make them easily searchable.

saturnring · June 20, 2019, 8:12pm

My personal opinion is having my observation ID’d by the AI is no different from an identifier using the AI to bring up suggested IDs and then putting one of them on my observation (which can happen to any observation of anyone now). The only difference is that the AI probably won’t respond to comments, but, again, that’s the case with many identifiers now.

pisum · June 23, 2019, 12:29am

if you’re thinking of going down this road, i think it would be better to add a separate field altogether for computer vision ID, maybe displayed in the UI as a box below the existing community ID box in the observation detail page. (the CV ID would be added at the creation of the observation. no need to wait for a given date. and if someone didn’t want to see it, they could just collapse that section.) a separate field makes it possible for different people to decide how old is too old for unknowns, since user A could pick up CID unknowns + CV ID spiders at one year, for example, while user B could pick up CID unknowns + CV ID spiders at one month. it could also offer an interesting way to compare CID vs CV ID en masse.

schizoform · August 14, 2019, 6:37pm

The CV-based observations are already marked – I agree that an automatic cv annotation would help get observations seen by the appropriate experts. I try to go though and do this manually – tweaking cv suggestions based on location, my own knowledge, etc – but it seems like an inefficient use of time.

Alternately – perhaps “unknown” observations could remain marked “unknown”, but show up in searches based on the cv best-guess when no other information is present?

cmcheatle · August 14, 2019, 6:42pm

I think you would need to actually run and save the CV guess rather than this. On the fly running of the CV against all unknown observations every time someone ran a search is likely extremely server intensive and unlikely to run in any kind of acceptable time.

schizoform · August 14, 2019, 8:01pm

Completely agree. It’s mostly a bookkeeping thing; keeping the automatic CV guess … discreet … unless someone has manually approves it.

I just now notice that there’s sometimes a “placeholder” field that might be similar to what I’m describing. This is a subtlety I hadn’t picked up that looks like it might be important…

jdmore · August 14, 2019, 10:42pm

Would probably also need to be re-created if/when observation photos change, and whenever there are significant changes to the CV system (as recently happened).

kiwifergus · August 15, 2019, 1:38am

I do like the idea of an obs getting auto-ID’d with a CV choice if it hasn’t been given any ID at all after a reasonable time, say 12 months. Usually by then someone has got around to putting it to Order or Family, but if the auto-ID was put at the Order or Family of the leading CV suggestion, then I think that would be at least comparable to a volunteer identifier doing so with taxa that they are not familiar with. Still marked with the CV symbol of course. It would cut down a lot of grunt work. They would still be in the Needs ID pool, but would become included in the filtered ID pools that many specialists limit themselves to.

Topic		Replies	Views
Does anyone else get bothered by how many observations are marked as "unknown species"? General question	231	9390	December 27, 2022
Difficulties with Identifying General	21	1109	July 1, 2024
Can't find unidentified observations before 2020 General	33	697	May 9, 2024
Get ID to old observations General	31	1166	June 18, 2026
Thoughts on unknown ID level? General	59	1364	October 9, 2025

Automatic iNat suggestion for "unknown" observations that reach a certain age

Related topics