Thanks everyone for all these really helpful suggestions and ideas! I’m really aware of the hubris of machine learning:
Hopefully the post came across as wanting to learn more rather than “I can solve all your problems :D”. Anyway. To reply to some of the issues raised:
how would you recruit non-experts in sufficient numbers to make a difference? - lynnharper
Fair point: I’m not expecting to be able to “solve everything” though. The question of numbers is presumably a problem that zooniverse projects also need to address.
Lumbricidae…often requiring a lot of pictures of hard to get areas like the underside of the clitellum - zee_z
Thanks for the suggestion! On Earthworms, this project seems quite relevant to your suggestion? You might want to ask them to contribute the collected & id-ed images to iNaturalist?
US and Canada fly group erikamitchell
this sounds really interesting - can a complete novice still join? It might be good to join anyway, to get more insight into the process. Thanks for the suggestion!
Responding to hanly
Thanks for your list, and taking the time to help with this. Replying to a few points:
start with…taking class Insecta down to order.
This is a great suggestion & sounds like a really good example: (1) it doesn’t require additional invisible info (e.g. microscopy) (2) probably is quite a well described problem (3) there is a need.
A lot of the biggest labeling needs are in the regions that lack guides, experts, and machine vision suggestions.
Good point. The approach I’m thinking will slot into the space between “super simple/already done” and “no guides or experts”.
Starting general is a good way to get someone to feel confident enough to start trying to ID at finer levels like genus and species.
One aspect I’ve not really thought enough about is supporting the new participant’s future interest/learning.
I do think that creating learning modules for more challenging taxa could be very useful…but I wonder how that would scale up without significant expert input for each module
It might be that, for example, zee_z needs to collate a dataset of well labelled images of Earthworms, and needs 300 of each species, and might be willing to, e.g. advise on how to move the text/advice from the key for that Family into the supporting material on the system. So there might be some expert help, but hopefully the return for them is worth their time!
It could be an intellectually interesting exercise on learning, but even for a well-defined 100 species group with no other issues, how long would it take to train a naïve identifier?
The idea is that individuals only learn (at least to start) a tiny task as part of the whole. For example, combining other individuals’ labelling, and the output of the computer vision system, indicates that a photo is of one of 4 species; the model of each individual tells the system that Emma is good at distinguishing between these (she has done well in the past at this and/or has been ‘trained’ on this). A valid criticism might be that this sounds boring for individuals, but (a) people do far more boring labelling on zooniverse! :) and (b) one thing that can put people off doing id is the feeling that it’s far far too hard. If they just have to decide if the wings of a bumblebee are dark or not (to e.g. separate a Red Tailed Bumblebee from a Red Tailed Cuckoo Bee) then they might quite like that this is within their capabilities.
The idea is that the probabilistic model and the reinforcement learning stage figure out who it is best to show each image to… so the “logic” above is somewhat anthropomorphising how the algorithm decides on the image.
Whether any of this actually works needs a bit more discussion and advice, and eventually testing! But that’s the idea.
I’ll send you a direct message if that’s ok hanly!
Responding to cthawley
How do you define “non-experts”
I mean people like myself - who basically have zero capabilities! By “expert” I was thinking of someone who can give roughly the most accurate ID that can be given, for a given set. For ‘non-expert’ I’m imagining the typical visitor to zooniverse, for example!
Can you confirm if labeling = IDing
Yes. Although a single attempt by a participant doesn’t lead necessarily to a definite label.
you’d be looking for taxa that have lots of observations, and for which photo IDs are possible, but these observations go unIDed because of a lack of qualified IDers (which may be due to a lack of general interest though also likely to a lack of availability of quality ID materials/guides).
Really well said — is this actually a true problem.
For the data underlying that graph, I don’t think the key issue is necessarily the lack of IDers, but lack of observers - many species are rare or located in areas with few observers, so these are not necessarily “unpopular” in a common usage of the word.
I imagined that collecting photos of things wasn’t the bottleneck - so this is a really crucial insight! Sorry for my naivety.
they are nearly impossible to ID to species with photo evidence. This might be the case for many fungi for instance - there are a good amount of fungi observations on iNat, but because taxonomy for fungi is so undetermined, and photo ID is difficult, these aren’t really “unpopular”.
This is the reason I actually assumed that would be the main issue - not the first two.
I wonder how much of the three reasons above restricts adding lots more research grade IDs? I.e. if we imagined that there were 1M photos of every species on Earth (removing the “observations” issue) how many taxa/species would remain unpopulated, and there were unlimited experts, how many taxa/species would then remain unpopulated on iNaturalist?? I guess I need to learn about these three problems to get some insight into if my approach is actually of any use!
From bugbaer
Really useful - thanks. I’m feeling substantial feelings of imposter syndrome; given how little knowledge I’ve got. My only efforts have been to learn bumblebees so I can take part in the Bumblebee Conservation Trust’s BeeWalk transects, and learning bird song (for my own interest).
Thanks! That’s not my profile – I’ve only IDed about 3 in my current profile though…
I’ll send you a direct message, bugbaer, if that’s ok [this reply is too long].
Responding to sedgequeen
Then, they need follow-up! They need to know when they’re right. Are you or those working with you ready to check each of the first few dozen identifications for each of your participants?
The optimistic idea here is that they get given some info (e.g. from the key) and are asked to id something (that we already know the id of!) and then they get feedback immediately & automatically. The idea is that as they improve (assuming they do) the ‘model’ of their ability keeps track of this, and can start to judge when they might start e.g. providing useful guesses; the idea also is that the tool should know where there are ‘gaps’ in its pools of users, so knows where we need to do more teaching etc.
In limited experiments this seems to work - but we really don’t know if it’s a good idea in reality!
Thanks again for the help.
Re lothlin’s point
Your example is a good one – something that can probably be taught to complete lay-people quite easily… maybe the computer vision (CV) system can get it down to those two species, and just needs to train-up a few participants to separate them… once there’s a handful of training examples labelled, it might be the re-trained CV system becomes really good at it, and we don’t need any more human help with that distinction… etc… but I don’t know if that would work! It’s just the idea!
Re sbrobeson
there’s practically no incentive to identify a common and readily identifiable organism like American Tuliptree
Yeah: I guess the idea is to pick on some area that has images, but not enough experts to help, and see if we can, by structuring things well enough, get non-experts to help label things.
I don’t see much non-experts can do there when the experts don’t even really know yet.
agreed - I’m assuming that at a minimum the image needs to be in principle /possible/ to id!
Responding to spiphany
You say you will be using a custom platform for the training, but if the purpose is to train participants to be able to help ID observations on iNat specifically…
Ah, no - my purpose is to see if there’s a way of combining some novel approaches to see if we can get good quality ID labels from a ‘crowd’ of novices, for problems that individually they probably couldn’t solve.
what is the background of the people currently involved in the project? Machine learning? Or do you have people with training in biology/taxonomy as well?
The main purpose of the post was to reach out to those involved in crowd-source labelling for biology/taxonomy. I’ve had a few meetings with colleagues in the biosciences about this; I’m also speaking to those in the School of Education (as that feels like a very relevant angle!).
it is essential to also learn the relevant morphology and scientific terminology, at least to a certain degree. This requires a very different, structured training approach than mere image recognition.
That’s an important insight, and maybe indicates that the idea is a terrible one! Would this be true if someone is just distinguishing between e.g. three species though?
Thanks to everyone! This reply is already too long. Thanks also schoenitz, and t_e_d and tallastro, etc.
I’ll have a go at IDing more things on iNaturalist as a thank you to all of you for your help!
It’s been really useful to get these insights. We’ll discuss this with our collaborators in biosicence. Thanks again!!