Hi,
I am a UXR in the tech industry and an active iNat user. My focus is on fungi in North America.
In my work, I have been exploring user feedback for model improvement. This has made me more interested in the experience of “Identify” workflows on iNat and how they are input to the CV model.
Fungi are particularly problemmatic to ID (based on my experience). There are estimated to be:
10-20K bird species in the world
900K Insect species
6M+ fungal species
The id process is known to be problemmatic in fungi - where expertise is narrow, there is convergent evolution, and many species are undescribed. However, even well known species are badly misidentified at Research Grade. I would like to explore whethere there are ways to improve the quality of active and passive signal we gather to better improve the CV model.
If you are interested in this topic, please let me know. I am a qualitative user experience expert and would likely want to talk with users of varying expertise who “identify” species in a few select Families, to understand the relative workflows, incentives, accuracy and pain points.
UX stands for user-experience. It means the study of how products are designed and used, and in the context of tech - how what you see on a screen (words, actions you can take, workflows enabled, features offered) shape the way people interact with computers.
For example, in this context we would look at the IDENTIFY workflow:
How it is accessed
Who uses it
What it incentivizes and does not
How users perceive the work they do when identifying
How people understand the information being collected, what it’s for, and how its used.
Typically, it’s done in combination with logging. To quantify a problem for a product we would track metrics like the % of observations that reach research-grade in different Kingdoms, or engagement - how long it takes to get an identification from another person, or how many observations remain without any review.
Then, we’d agree which are a problem and try and make changes to move those metrics in the right direction.
I could adapt the end result to whatever the community agrees is worthwhile. My current end goal would be to identify and surface opportunities for developers to improve the quality of signal we receive from identifiers.
This would be publically available and not for my work. I’m just interested in applying my skills to this problem, since it’s something I care about.
Just as an aside, you’re low on the insect species estimates. This is not to take away from your main point at all, it’s to reinforce the complexity and difficulty of identifications within certain taxa.
There are about estimated to be about 1.5 million species of beetles alone, only 300,000-400,000 have been described.
For fungi - perhaps a popup prompting - the 3 preferred views for a potential ID.
Please rethink the prompt to just dump it in plants. I am still retrieving obs that were effectively disqualified from CNC, as they are neither Unknown, nor filtered out by taxon specialists. An honest Unknown would still have had a fighting chance. Now it is about retrieving any useful obs of unusual biodiversity.
This rather important general point somewhat undermines your focus on the identification UX. In reality, the most pressing pragmatic issues are all at the other end of the process. Many taxa are just plain hard or impossible to identify. But even where identification is feasible, there often aren’t reliable guides available that are accessible to laypeople. This often makes it very difficult both for observers to provide the right input and for identifiers to consistently evauate it. So the damage has usually been done long before people engage with a user-interface. In other words: this is yet another variation on the age-old problem of Garbage in, Garbage out.
In the UK, there are many independent recording schemes that are dedicated to particular taxonomic groups. These schemes take data from many sources, including iNat (via GBIF). In recent years, several schemes have put a lot of effort into providing free guides that present the key identification features in a format that is readily accessible to laypeople. This can dramatically improve both the quantity and quality of records supplied by the public, as well as the productivity of identifiers.
Should iNat involve itself in similar pre-emptive educational efforts? At the moment, the primary focus seems to be on quantity rather than quality, since the site policy is explicitly skewed towards a relatively high tolerance of garbage input (compared to many other recording sites). There needn’t be anything wrong with this per se, if that’s part of what it takes to promote greater engagement with nature. But a short-term “anything goes” policy will inevitably put an ever-increasing burden on the identifiers if there’s no counterbalancing longer-term effort to improve the quality of the input.
i’m struggling to imagine some sort of result along these lines that iNat staff would realistically work on any time soon.
i think where you’re trying go with this is to work out a way for the CV to suggest to the user sort of image templates that would increase the probability that someone would be able to identify an organism. just for example, in Seek, if you encounter an organism that its CV can’t get to species, Seek’s stock suggestion, i think, is to get closer to the subject. but what if it could somehow suggest to the user to try specific viewpoints by showing them sample images that have a high probability of being identified successfully?
something like this might make sense to try if you had unlimited resources, but could iNat staff realistically do anything with such an idea?
This is an interesting observation and suggests that maybe I should ask a different question, about the role of different efforts and investments in our ultimate goal. If it is more widespread good quality ID by humans, that’s a different problem.
I am also involved in efforts to improve ID. I work on the fungal family Cortinariacea and give many talks, have collaborated (led) description of 4 new species and am working on regional keys. I really like your connection to the outcome “productivity of identifiers” and will think about this more.
That said, the huge opportunity in fungal ID is coming through massively increased sequencing of species. Currently, the ID flow does not surface to me that an observation has been sequenced. The determination based on sequence is a different workflow that we don’t enable. If we are talking about improved quality, surely this datapoint should have more weight?
Actually, I am really agnostic to the solution. As a researcher, I routinely explore an area to see what the issues are and what problems might emerge. This conversation has already surfaced a lot of ideas and is valuable to me. However, the question of “could we ship something you identify” would be totally up to the dev team and I would not consider this a waste of time if we decide not. Having done this for a long time, one always learns something if you define the problem right (which you are all helping me to do).
Note: I will try the Seek app to see what you are talking about :)
Successful solutions could enable:
good IDers to identify more by improving incentives or priority or purpose so the CV has more signal
general IDers to id with greater confidence
more or finer training data on species concepts so the CV gets better faster
Btw, does anyone know what the CV model is trained against? What is a successful outcome for the model / how does the model know it is correct?
I don’t know if it’s useful to you, but I am a mycology (fungi) focused identifier, with “medium” level expertise. Might represent an average fungi-focused person on the site. One thing I wish I had were annotations for all Rust fungi (order Puccinales) prompting the labeling of the host plant. I was thinking about the other day what kind of annotations being always-present (like some of the ones for plants, animals are) would help us. Freshly emerged vs old specimen (to help with seasonality data), “Part observed” with options like “Spores” “Fungal Fruiting Body” “Wood Staining” (as in Chlorociboria sp.), etc. where you can click more than one of those (for observations containing pictures of the fungal fruiting body and microscopy of the spores for example).
The BIGGEST prompt people posting a picture of a fungi for the first time (as adjudicated by the computer ID thinking it’s a fungus) could have is to add more than one picture from more than one angle!! Especially of the “fertile surface” (tubes, pores, gills, ridges, etc) of a specimen.
Having a terms glossary with illustrations could be nice too, if it could be linked from all fungi taxa pages. Having the word for something not only helps with communication, it helps with seeing it and choosing to photograph with it in mind in the first place. Which helps good IDs get done.
I have also had a lot of success telling people about the Data Quality Assessment section at the bottom of each observation, and how we (as mycology identifiers) can use the “no it’s as good as it can be” feature to clean up identifications. There are plenty of examples where really, you aren’t getting better than genus without sequencing or microscopy. So, put it to the genus level and mark “no it’s as good as it can be”, and it still counts for that. That species complexes are getting their own labels on iNaturalist as options for us helps too.
As to the CV model I don’t know but I’m guessing it’s trained against all the iNaturalist data itself, including all the times we have identified something as Species A but then flipped it to Species B. That’s how it makes the “Similar Species” tab. I have seen it happen when someone goes through correcting something on a lot of observations that all of a sudden the similar species tab for those two taxa reflect that event, and have each other as “most misidentified as”.
i see what look like a lot of business buzzwords to me, and i still have trouble seeing where you’re trying to go with all of this.
if you’re just looking for general information on how people use the system in relation to fungi and computer vision, and how computer vision works, there’s already plenty of that kind of discussion in the forum. some of it goes relatively deep. you could read for hours, maybe days.
and people could spend hours or days rehashing a lot of that or bringing up new things. but for what purpose? how will whatever you’re proposing to do clarify things and, more importantly, spur action any better than efforts that have come before?
I think it is kind of hard for most of the people on this forum to understand because your posts are really buzzword/jargon heavy. Are you offering to do actual development work of some kind, perhaps creating a third party tool, or offering to tell the existing limited staff of developers that they should do even more things? If it is the second there is already a backlog of really obvious “pain points” that need fixed *cough* notifications *cough* (is this the correct sense of the term?).
When you talk about ‘sequencing workflow’, is that some kind of tech jargon term I am unfamiliar with, or are you literally talking about DNA sequencing, like the handful of observations with a DNA barcode copy-pasted into an observation field? If it is that second one just a bit over 0.01% of observations have that, and all but ~800 are already ID’d to at least genus, so I would expect that flipping to the ‘annotations’ tab in Identify is not an unreasonable barrier to surfacing such relatively niche content (is there literally any identifier who could just look at those by eye and glean anything useful from them anyway?).
@sulcatus : I might be mistaken, but I believe that iNat’s CV model is content-free with regard to knowledge of morphological structures and taxa. In other words, I believe it is rolling its own set of identification criteria strictly from the photos and associated metadata. So I don’t think it would be possible to have a separate UI for just fungi.
That said, I’ve seen enough posts on the parlous state of iNat fungi IDs to believe that some grand reset is probably in order. It sounds (to this non-fun-guy) like many people assume more knowledge than they have when it comes to fungi IDs.
SO, perhaps there could be some kind of link that dynamically appears whenever the community ID is in fungi, and that link would take the user to a Fungi Guide that would supply general ID principles to follow and pitfalls to avoid?
In other words, if we’re pitching this at human IDers, as opposed to changing the CV model itself, perhaps we could provide well-intentioned humans with enough knowledge to know when they don’t know.
The CV model is trained against the dataset of observations ID’d by inaturalist users. It uses a single model for all of the nodes it is trained on. Its ‘successful outcome’ is predicting the species (actually the node, which could be up to Genus or Family in some taxa where there aren’t many species-level IDs) that the observation was ID’d as on inaturalist. There are lots of blog and forum posts about it, here are a couple: https://www.inaturalist.org/posts/59122-new-vision-model-training-started https://forum.inaturalist.org/t/computer-vision-update-july-2021/24728
Related to what you were saying, I have said before that a really cool feature that I hope someone builds someday is a version of the CV model that can somehow give us some indication what features it is using; I think it is pretty clear that in some cases the CV has figured out some set of reliable ID features that field guides do not know about/include (i.e., it can sometimes correctly and with high confidence ID photos that contain none of the features the field guides describe). The inat team all but certainly does not have the bandwidth to build this feature right now so it would have to be a third-party app of some kind.
If you want a workflow for identifying specimens via sequences, the Barcode of Life Database is one.
Very few observations on iNat are going to have DNA sequences associated with them, because
While the technology is at the point that someone with time and money to burn could conceivably be DNA barcoding samples from home, it’s still a lot of work and is most labor-intensive at small scales.
There are ethical and legal issues with collecting samples/specimens for DNA barcoding that are not present for taking photos of organisms, even if you are temporarily capturing them to take the photos.
If people frequently can’t get close enough to an organism to take a good photo, they’re usually not going to be able to collect a tissue sample. This may be less of an issue with plants and fungi, but they also have cell walls and a lot of secondary chemicals that make DNA extractions more difficult. Plant DNA extractions often use chloroform and other chemicals that need special ventilation systems to use safely.
I’m a biology professor who does DNA barcoding and has a fair amount of freedom in which organisms I barcode, but the overlap between the individual organisms that I have observed on iNat and that I have DNA barcoded is 0%. Because I study caterpillars and some people on iNat rear them, I’ve just started offering to DNA barcode a few interesting ones that died before the adults emerged or could not be IDed from the adults. But that’s a very different level of effort than trying to routinely DNA barcode everything interesting that I see on a hike.