Suggest ID for sounds?

nicklambert · November 16, 2020, 5:32am

As I uploaded a cicada call a moment ago, I thought to myself: “How awesome would it be if iNat analysed and suggested an ID for sounds as well as photographs. Would it be possible for the iNat AI to ‘learn’ sounds?”

Unsure if this has been discussed before, how difficult it would be to implement, whether others think it would be worthwhile etc.

Interested to see what other people think. This isn’t a feature request, just a discussion.

Just a thought anyway! :)

thebeachcomber · November 16, 2020, 6:07am

Would definitely be a cool feature (it might finally mean I stop posting 100 sound recordings of crickets and IDing then as frogs…), but I suspect wouldn’t be a super high priority; of the 54 million verifiable observations at the moment, just 0.2% include sounds.

The low number of sounds uploaded may also be an issue for how the computer vision can learn, given it requires (I think) a minimum number of data points. I suspect many species would not meet this requirement for sounds.

@alex thoughts?

candicevandam · November 16, 2020, 8:15am

I would love that. I keep hearing strange sounds that I have no idea where to start. It is surprisingly hard to look for sounds online. One can find frog sounds, bird sounds and the rest are very difficult.

zabdiel · November 16, 2020, 10:45am

I think it has been but I couldn’t find the previous thread.

bouteloua · November 16, 2020, 1:04pm

Hi Nick, here’s the previous discussion and staff’s response: https://forum.inaturalist.org/t/recognize-sounds-automatically/3527/

This is what we use feature requests for–discussion, difficulty, whether others think it is worthwhile–but if you have suggestions for improvements feel free to add them at Forum Feedback.

teellbee · November 16, 2020, 3:28pm

Many, many years ago, I wrote the developer of iBird (Mitch Waite) and asked about IDing bird calls, much like Shazam identified music recordings. He explained how much more difficult it would be to do that with nature recordings compared to matching a database of digital studio recordings.

Also, it would require users to record the bird with a specialized (parabolic?) microphone to focus on the sound and reduce the background noise.

I know using my iPhone Voice Memo doesn’t produce a very clean, clear recording. I’m often surprised to hear stuff on the recording I had no awareness of when I made the recording: a dog barking, a car going by, wind, clearing my throat, etc.

Of course the technology for such may have improved a lot now days.

marina_gorbunova · November 16, 2020, 4:57pm

But there was an app to recognise bat calls, so I think it’ll be on iNat too after some years.

okbirdman · November 16, 2020, 5:02pm

The technology is obviously imperfect, but the Cornell Lab is currently working on developing audio recognition software for birds at least. A live stream of the software processing audio from outside the Lab can be found here: https://birdnet.cornell.edu/live/

As mentioned by others, the wide range in quality and relatively small volume of audio uploads on iNat will probably make audio suggestions a low priority and logistical improbability for the foreseeable future.

naturalist_nate · November 16, 2020, 6:00pm

I use the Cornell app. It’s called BirdNET (only available for Android right now, IIRC). It is pretty cool when it works, but distance and background noise make it hard for it to work well on a regular basis. Urban environments are HARD. Almost have to be deep in the woods to get anything useful. The phone mic certainly isn’t as sensitive as my own ears.

It does export observations to iNat, which is cool. All it does in that regard, though, is to send the audio clip over. It doesn’t even populate the date, time, and location (which is recorded in the app) on the iNat observation, so filling those details in manually is a challenge. I’ve used it to record things other than birds, because the recording process, selecting the audio segment, then sending the file to inat DOES work particularly well.

I think part of the reason more folks don’t submit audio observations is because the process to do so kinda sucks in a lot of cases. For me, the other reason has to do with distance and background sounds making it hard to record something of adequate quality.

This was a recent observation I made using the app. The pics are just for giggles. The birds are BARELY visible at all, but the app nailed the ID from the audio recording. Pretty ideal recording conditions here. It was quiet and the birds let me get pretty close.

https://www.inaturalist.org/observations/63662127

loupyes · November 17, 2020, 12:57am

I would say phone mic are not perfect sure, but for some people with not good audition its great, and especially among high frequencies, it could be better than most humans. BirdNET’s findings can be very surprising in good forest condition. It’s a great opportunity for everyone without strong photography skills and costly camera stuff. Thats why I’m confident in it, but the interface is not perfectly working.

loupyes · November 17, 2020, 1:07am

I would say it will be very useful of course. But especially, I’d love to have sound recognition working permanently when using the Seek app. I see a lot of smartphones now have small but kinda powerful teleobjectives… I’d love being able to get accurate IDs on birds and bats I can’t approach, using both sound and image. All this on the go with a phone would be futuristic.

alex · November 17, 2020, 9:03pm

I have to admit I know very little about sound recognition. Here are some very disorganized thoughts about a few audio species classification projects that I know of.

Birdvox (https://wp.nyu.edu/birdvox/) is kind of like microphone trapping for bird migrations. They’re trying to fill in radar data (which can give information about migrating biomass but nothing about species) to understand bird migrations.

Rainforest Connection (https://www.rfcx.org/our_work#monitoring) is mostly looking at significant audio events, such as detecting the difference between standard rainforest background noise and logging activity like chainsaws, or detecting the presence of a specific endangered species.

Forschungsfall nachtigall (https://www.museumfuernaturkunde.berlin/en/science/nightingale-research-case-citizen-science-project-natural-and-cultural-history-nightingales) is a project from the Museum für Naturkunde in Berlin, identifying Nightingale songs from citizen science phone recordings.

Some differences between these systems and iNat:

Attention: Almost every iNat photo has been created with the attention of a human. A human has identified the species of interest, and taken a picture where the species of interest is typically centrally located, free of obscuration or occlusion, and in focus. Other potential species of interest are usually cropped out of the frame or not centrally located. The relative quality and control of cameras vs microphones on phones makes this particularly difficult to resolve in audio recording. Neither Rainforest Connection nor BirdVox have any sense of attention - they are listening all the time and must distinguish between significant background noise and the target sound(s). Forschungsfall nachtigall does incorporate attention - humans record and upload what they believe is Nightingale song.

Scope: Part of what makes iNat so awesome is that all species in the tree of life are candidates for observation and identification, and all identifications hang off the tree of life. All the hard work of sorting and grinding out the taxonomy pays off when an observation gets an identification that’s attached to a real species label instead of a generic tag like “tree” or “bug.” The vision model we’re training now knows about roughly 30,000 leaf taxa (mostly species), and because of how it is deployed it can make predictions about parent or inner nodes as well, which represent another 25,000 higher ranking taxa. I believe the birdvox “fine” model can classify a few dozen different species, and the other two projects can only identify one or two.

BirdNET (which @okbirdman posted) is an amazing project from eBird. It’s probably the closest analog to iNaturalist - it seems to be able to classify almost a thousand species of birds and is works in attention-based scenarios like their Android app. It’s powered by the MacCaulay library dataset which contains hundreds of thousands of labelled high quality bird recordings, and eBird has some of the best audio ML researchers in the world working on it.

system · January 16, 2021, 9:03pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

ahospers · August 20, 2021, 3:17pm

I would like Computer Vision includes sounds so there is no need to compare.

https://nioo.knaw.nl/nl/employees/claire-hermans

https://www.youtube.com/watch?v=z5C3wLsyGdE&t=47m (OpenSource Introduction)
Tadarida Open Software Toolbox

Building an acoustic recognition database Tadarida-L (Toolbox Animal Detection on Acoustic Recordings

Sonochiro (Biotope) is ontwikkeld doo
BatExplorer is het programma van de BatLogger en herkent 53%.

Tadarida-L (Toolbox Animal Detection on Acoustic Recordings) is HulpSoftware, Open Source een Toolbox, altruistisch geschreven door Yves Bas waarbij je zelf je eigen classifier software kunt schrijven. Het werkt adhv herkenning van signalen. Used for birds, bush crickets and bats.
Tadarida-D=Detection
Tadarida-C=Classification
Tadarida-L=Labeling
https://github.com/YvesBas

https://nioo.knaw.nl/nl/employees/claire-hermans

jwidness · August 20, 2021, 4:06pm

I moved your post to this thread about automatic sound suggestions since the other topic was for the non-automatic Compare tool.

ahospers · August 20, 2021, 6:52pm

https://www.inaturalist.org/projects/groningen-vinkhuizen-eelderbaan/journal/44819-building-an-acoustic-recognition-database-tadarida-l-toolbox-animal-detection-on-acoustic-recordings-47

Studenten aan de UK Universiteit hebben Auto Rec hardware AudioMoth ontwikkeld (Open Acoustic Devices) een klein toestelletje dat flink aan populariteit aan het winnen is. https://www.openacousticdevices.info/audiomoth

Stewart Newson Norfolk Classifier in Belgium and UK.
STewart heeft voor de UK een classifier gebouwd om de UK soorten te determineren. Classifier UK uitbreiden met BENELUX soorten en classifier leert van zijn fouten. Tadarida is zeker niet slechter dan Kaleidoscoop als de opnamen maar goed zijn.

https://openresearchsoftware.metajnl.com/articles/10.5334/jors.154/
https://www.youtube.com/watch?v=z5C3wLsyGdE&t=50m
https://photos.app.goo.gl/iCaBSZHkzw9cNW9K9

De derde lezing (vanaf 1.41.17uur) is wederom door Claire Hermans en geeft een preview op haar project: ‘Light on landscape’ waarbij ze vertelt over de werking van Microphone-arrays om vliegpaden van vleermuizen te reconstrueren.

https://photos.app.goo.gl/iCaBSZHkzw9cNW9K9

In Frankrijk is ook een Monitoring project gebaseerd op Tadarida en ze willen hun Franse herkennigns database ook niet ter beschikking stellen. https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.13198

https://www.youtube.com/watch?v=z5C3wLsyGdE&t=52m
De tweede lezing (vanaf 52.18 min) is door Marc van der Sijpe en Claire Hermans : ‘Introductie in auto-recording en auto-identificatie en Explaination how works Tadarida and the BTO classifier of Tadarida’
Deze presentatie wordt in het Nederlands en Engels gegeven. Als eerste geeft Marc Van De Sijpe een introductie over auto- recording en auto-identificatie, waaronder Tadarida, een Open Sourceclassificatie in de programmeer- en statistiektaal R. Daarna neemt Claire ons in het Engels mee hoe Tadarida werkt, waarna ook de BTO classifier wordt gedemonstreerd

https://www.youtube.com/watch?v=z5C3wLsyGdE&t=101m Effect of light intensity on Habitat loss De derde lezing (vanaf 1.41.17uur) is wederom door Claire Hermans en geeft een preview op haar project: ‘Light on landscape’ waarbij ze vertelt over de werking van Microphone-arrays om vliegpaden van vleermuizen te reconstrueren.

https://openresearchsoftware.metajnl.com/articles/10.5334/jors.154/

https://www.youtube.com/watch?v=z5C3wLsyGdE&t=47m (OpenSource Introduction)
Tadarida Open Software Toolbox
https://www.youtube.com/watch?v=z5C3wLsyGdE&t=59m (English)

DIY Zelfbouw Teensybat-VLEN Bat detector/recorder (Vleermuis) Open Source (45)

How to start a new BAT Classification database (Classifier) Together.
Are there people interessted in building a database.
Mark gave some identified recordings. With this TADARIDA L
https://www.youtube.com/watch?v=z5C3wLsyGdE&t=111m
https://openresearchsoftware.metajnl.com/articles/10.5334/jors.154/

There is an R programm…Any user that subscribes can upload a ffile in a simple window to the Cloud in England and send back a .csv with the result. The volunteers in the UK throw away their recordins so now they are storing the uploading the files in the cloud. The identification will be stored in the UK Waarneming.nl. People can tell if the detection is wrong and the R programm can be adapted. Unknown if it will be Open Source if not Funding not Open Source. Free for volunteers
REQUIREMENTS:
Audio Bat Wav file with the namespecies and location of the bat in the file
NEM VT has recordings

Bestimmung von Fledermausrufaufnahmen und Kriterien für die Wertung von akustischen Artnachweisen - Teil 1
The original FFT is processed every 0.67msec. If you use a zoom factor of 4, you only take every fourth sample. Thus, it needs four times longer until you have acquired enough samples for the FFT, thus 2.67msec. Now, the trick is that you do not wait until you have all the samples collected, but you allow for „overlap“ in the samples and perform the FFT with the same rhythm of 0.67msec. [please note that this also takes four times the processing power compared to non-overlapping samples for the FFT and also takes more memory]

I have collected more information on the ZoomFFT in this Wiki:
https://github.com/df8oe/UHSDR/wiki/...ode-=-Zoom-FFT

I am not experienced in programming audio library blocks, but my gut feeling is, it could be easier to just use a queue object from the lib to get the samples and perform all the calculations in the main loop and not inside an audio lib object (because there are different sample rates involved). However, for people also interested in using the ZoomFFT, if you design a specific audio library object, you will get much more credit ;-).

Best wishes,

Frank DD4WH
https://www.bestellen.bayern.de/application/applstarter?APPL=eshop&DIR=eshop&ACTIONxSETVAL(artdtl.htm,APGxNODENR:34,AARTxNR:lfu_nat_00378,AARTxNODENR:357135,USERxBODYURL:artdtl.htm,KATALOG:StMUG,AKATxNAME:StMUG,ALLE:x)=X

BTW: this brandnew publication will rapidly become the professional standard for the identification of bat calls from spectrograms in Germany. Maybe it also helps others with ID of bats in Central Europe.
https://www.bestellen.bayern.de/application/eshop_app000007?SID=1399194937&ACTIONxSESSxSHOWPIC(BILDxKEY:'lfu_nat_00378',BILDxCLASS:'Artikel',BILDxTYPE:'PDF')

Bestimmung von Fledermausrufaufnahmen und Kriterien für die Wertung von akustischen Artnachweisen - Teil 1

bouteloua · October 19, 2021, 6:52pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Merlin for insects? Nature Talk	36	5364	September 17, 2024
How do you review/identify audio observations? General	19	377	August 14, 2025
Audio recordings of bird song/calls General	54	4152	May 9, 2023
Sound tips and tricks? General	26	1886	December 29, 2024
Recognize sounds automatically Feature Requests	13	11183	June 2, 2020

Suggest ID for sounds?

Building an acoustic recognition database Tadarida-L (Toolbox Animal Detection on Acoustic Recordings

Related topics