The “Problem”
This graph from the iNaturalist blog shows about half of the identifications were by only about 100 of the 100k identifiers! Similarly the issue raised in this post “I suspect that a lot of potential identifiers out there don’t know where to even begin”.
The other problem, illustrated in the graph from this blog post illustrates the ‘long tail’ problem, that 99% of species have only a few images.
In Summary: How do we get non-experts to help label ‘unpopular’ taxa?
What we’re doing/What we need
Myself and colleagues at the University of Sheffield are developing an approach to support non-experts to help with labelling “difficult” taxa.
We are looking for researchers (and e.g. iNaturalist’s staff/organisers?) in the field of ecology etc who could collaborate on this project - in particular maybe if they have a particular Order/Family/Genus that is ‘under-labelled’ we could use to demonstrate the approach with. We were planning to apply for funding - so this would hopefully not lead to extra effort on iNaturalist’s side of things. (and hopefully will lead to various taxa being well labelled, and some approaches for non-expert labelling).
I’m imagining that we would probably need to run this on a separate platform etc, due to the way our approach works (e.g. it is intended to allow for much more uncertainty in individual labelling, and gives the users images to label or learn from - more like the ‘zooniverse’ approach than the iNaturalist approach - with e.g. additional text-support to help them learn).
The current project
We propose an approach that will allow non-experts to still help generate labels for challenging domains. Our method has three key components:
- The first is a way of describing (‘modelling’) each individual’s abilities using a Bayesian approach. We can also model ability as a process over time, that is, we can model the potential for the participants to learn.
- The second uses reinforcement learning (RL) to select which image to show a given participant. We obviously want to show a participant an image to determine which species it is, but we might also show them an image we already know, to help explore their abilities. Additionally, we might show them an image that we know (potentially with supporting text) to teach them to label new classes. RL can optimise the trade-off between providing participants examples we already know they can label, and examples that improve our model of their abilities.
- Finally, for the above approach to be effective, we need to have some sense of what subset of species a particular animal may be. We will train standard computer vision classifiers with existing labelled images, assess its accuracy for different species, and then use it to provide an initial ‘guess’.
Summary
A collaboration with the team running iNaturalist or another similar platform is really crucial: We are looking to understand more about whether this approach might work, and how it could support current approaches. Also more generally we’re looking for people with taxa etc that they think this approach would be a good fit for.
Thanks for the help,
Mike Smith and Robert Loftin (Lecturers in Machine Learning, University of Sheffield)