Suggest ID for sounds?

As I uploaded a cicada call a moment ago, I thought to myself: “How awesome would it be if iNat analysed and suggested an ID for sounds as well as photographs. Would it be possible for the iNat AI to ‘learn’ sounds?”

Unsure if this has been discussed before, how difficult it would be to implement, whether others think it would be worthwhile etc.

Interested to see what other people think. This isn’t a feature request, just a discussion.

Just a thought anyway! :)


Would definitely be a cool feature (it might finally mean I stop posting 100 sound recordings of crickets and IDing then as frogs…), but I suspect wouldn’t be a super high priority; of the 54 million verifiable observations at the moment, just 0.2% include sounds.

The low number of sounds uploaded may also be an issue for how the computer vision can learn, given it requires (I think) a minimum number of data points. I suspect many species would not meet this requirement for sounds.

@alex thoughts?


I would love that. I keep hearing strange sounds that I have no idea where to start. It is surprisingly hard to look for sounds online. One can find frog sounds, bird sounds and the rest are very difficult.


I think it has been but I couldn’t find the previous thread.

1 Like

Hi Nick, here’s the previous discussion and staff’s response:

This is what we use feature requests for–discussion, difficulty, whether others think it is worthwhile–but if you have suggestions for improvements feel free to add them at #forum-feedback.


Many, many years ago, I wrote the developer of iBird (Mitch Waite) and asked about IDing bird calls, much like Shazam identified music recordings. He explained how much more difficult it would be to do that with nature recordings compared to matching a database of digital studio recordings.

Also, it would require users to record the bird with a specialized (parabolic?) microphone to focus on the sound and reduce the background noise.

I know using my iPhone Voice Memo doesn’t produce a very clean, clear recording. I’m often surprised to hear stuff on the recording I had no awareness of when I made the recording: a dog barking, a car going by, wind, clearing my throat, etc.

Of course the technology for such may have improved a lot now days.


But there was an app to recognise bat calls, so I think it’ll be on iNat too after some years.


The technology is obviously imperfect, but the Cornell Lab is currently working on developing audio recognition software for birds at least. A live stream of the software processing audio from outside the Lab can be found here:

As mentioned by others, the wide range in quality and relatively small volume of audio uploads on iNat will probably make audio suggestions a low priority and logistical improbability for the foreseeable future.


I use the Cornell app. It’s called BirdNET (only available for Android right now, IIRC). It is pretty cool when it works, but distance and background noise make it hard for it to work well on a regular basis. Urban environments are HARD. Almost have to be deep in the woods to get anything useful. The phone mic certainly isn’t as sensitive as my own ears.

It does export observations to iNat, which is cool. All it does in that regard, though, is to send the audio clip over. It doesn’t even populate the date, time, and location (which is recorded in the app) on the iNat observation, so filling those details in manually is a challenge. I’ve used it to record things other than birds, because the recording process, selecting the audio segment, then sending the file to inat DOES work particularly well.

I think part of the reason more folks don’t submit audio observations is because the process to do so kinda sucks in a lot of cases. For me, the other reason has to do with distance and background sounds making it hard to record something of adequate quality.

This was a recent observation I made using the app. The pics are just for giggles. The birds are BARELY visible at all, but the app nailed the ID from the audio recording. Pretty ideal recording conditions here. It was quiet and the birds let me get pretty close.


I would say phone mic are not perfect sure, but for some people with not good audition its great, and especially among high frequencies, it could be better than most humans. BirdNET’s findings can be very surprising in good forest condition. It’s a great opportunity for everyone without strong photography skills and costly camera stuff. Thats why I’m confident in it, but the interface is not perfectly working.


I would say it will be very useful of course. But especially, I’d love to have sound recognition working permanently when using the Seek app. I see a lot of smartphones now have small but kinda powerful teleobjectives… I’d love being able to get accurate IDs on birds and bats I can’t approach, using both sound and image. All this on the go with a phone would be futuristic.

I have to admit I know very little about sound recognition. Here are some very disorganized thoughts about a few audio species classification projects that I know of.

Birdvox ( is kind of like microphone trapping for bird migrations. They’re trying to fill in radar data (which can give information about migrating biomass but nothing about species) to understand bird migrations.

Rainforest Connection ( is mostly looking at significant audio events, such as detecting the difference between standard rainforest background noise and logging activity like chainsaws, or detecting the presence of a specific endangered species.

Forschungsfall nachtigall ( is a project from the Museum für Naturkunde in Berlin, identifying Nightingale songs from citizen science phone recordings.

Some differences between these systems and iNat:

Attention: Almost every iNat photo has been created with the attention of a human. A human has identified the species of interest, and taken a picture where the species of interest is typically centrally located, free of obscuration or occlusion, and in focus. Other potential species of interest are usually cropped out of the frame or not centrally located. The relative quality and control of cameras vs microphones on phones makes this particularly difficult to resolve in audio recording. Neither Rainforest Connection nor BirdVox have any sense of attention - they are listening all the time and must distinguish between significant background noise and the target sound(s). Forschungsfall nachtigall does incorporate attention - humans record and upload what they believe is Nightingale song.

Scope: Part of what makes iNat so awesome is that all species in the tree of life are candidates for observation and identification, and all identifications hang off the tree of life. All the hard work of sorting and grinding out the taxonomy pays off when an observation gets an identification that’s attached to a real species label instead of a generic tag like “tree” or “bug.” The vision model we’re training now knows about roughly 30,000 leaf taxa (mostly species), and because of how it is deployed it can make predictions about parent or inner nodes as well, which represent another 25,000 higher ranking taxa. I believe the birdvox “fine” model can classify a few dozen different species, and the other two projects can only identify one or two.

BirdNET (which @okbirdman posted) is an amazing project from eBird. It’s probably the closest analog to iNaturalist - it seems to be able to classify almost a thousand species of birds and is works in attention-based scenarios like their Android app. It’s powered by the MacCaulay library dataset which contains hundreds of thousands of labelled high quality bird recordings, and eBird has some of the best audio ML researchers in the world working on it.


This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

I would like Computer Vision includes sounds so there is no need to compare. (OpenSource Introduction)
Tadarida Open Software Toolbox

Building an acoustic recognition database Tadarida-L (Toolbox Animal Detection on Acoustic Recordings

Sonochiro (Biotope) is ontwikkeld doo
BatExplorer is het programma van de BatLogger en herkent 53%.

Tadarida-L (Toolbox Animal Detection on Acoustic Recordings) is HulpSoftware, Open Source een Toolbox, altruistisch geschreven door Yves Bas waarbij je zelf je eigen classifier software kunt schrijven. Het werkt adhv herkenning van signalen. Used for birds, bush crickets and bats.

I moved your post to this thread about automatic sound suggestions since the other topic was for the non-automatic Compare tool.

1 Like

Studenten aan de UK Universiteit hebben Auto Rec hardware AudioMoth ontwikkeld (Open Acoustic Devices) een klein toestelletje dat flink aan populariteit aan het winnen is.

Stewart Newson Norfolk Classifier in Belgium and UK.
STewart heeft voor de UK een classifier gebouwd om de UK soorten te determineren. Classifier UK uitbreiden met BENELUX soorten en classifier leert van zijn fouten. Tadarida is zeker niet slechter dan Kaleidoscoop als de opnamen maar goed zijn.

De derde lezing (vanaf 1.41.17uur) is wederom door Claire Hermans en geeft een preview op haar project: ‘Light on landscape’ waarbij ze vertelt over de werking van Microphone-arrays om vliegpaden van vleermuizen te reconstrueren.

In Frankrijk is ook een Monitoring project gebaseerd op Tadarida en ze willen hun Franse herkennigns database ook niet ter beschikking stellen.
De tweede lezing (vanaf 52.18 min) is door Marc van der Sijpe en Claire Hermans : ‘Introductie in auto-recording en auto-identificatie en Explaination how works Tadarida and the BTO classifier of Tadarida’
Deze presentatie wordt in het Nederlands en Engels gegeven. Als eerste geeft Marc Van De Sijpe een introductie over auto- recording en auto-identificatie, waaronder Tadarida, een Open Sourceclassificatie in de programmeer- en statistiektaal R. Daarna neemt Claire ons in het Engels mee hoe Tadarida werkt, waarna ook de BTO classifier wordt gedemonstreerd Effect of light intensity on Habitat loss De derde lezing (vanaf 1.41.17uur) is wederom door Claire Hermans en geeft een preview op haar project: ‘Light on landscape’ waarbij ze vertelt over de werking van Microphone-arrays om vliegpaden van vleermuizen te reconstrueren. (OpenSource Introduction)
Tadarida Open Software Toolbox (English)

DIY Zelfbouw Teensybat-VLEN Bat detector/recorder (Vleermuis) Open Source (45)

How to start a new BAT Classification database (Classifier) Together.
Are there people interessted in building a database.
Mark gave some identified recordings. With this TADARIDA L

There is an R programm…Any user that subscribes can upload a ffile in a simple window to the Cloud in England and send back a .csv with the result. The volunteers in the UK throw away their recordins so now they are storing the uploading the files in the cloud. The identification will be stored in the UK People can tell if the detection is wrong and the R programm can be adapted. Unknown if it will be Open Source if not Funding not Open Source. Free for volunteers
Audio Bat Wav file with the namespecies and location of the bat in the file
NEM VT has recordings

Bestimmung von Fledermausrufaufnahmen und Kriterien für die Wertung von akustischen Artnachweisen - Teil 1
The original FFT is processed every 0.67msec. If you use a zoom factor of 4, you only take every fourth sample. Thus, it needs four times longer until you have acquired enough samples for the FFT, thus 2.67msec. Now, the trick is that you do not wait until you have all the samples collected, but you allow for „overlap“ in the samples and perform the FFT with the same rhythm of 0.67msec. [please note that this also takes four times the processing power compared to non-overlapping samples for the FFT and also takes more memory]

I have collected more information on the ZoomFFT in this Wiki:

I am not experienced in programming audio library blocks, but my gut feeling is, it could be easier to just use a queue object from the lib to get the samples and perform all the calculations in the main loop and not inside an audio lib object (because there are different sample rates involved). However, for people also interested in using the ZoomFFT, if you design a specific audio library object, you will get much more credit ;-).

Best wishes,


BTW: this brandnew publication will rapidly become the professional standard for the identification of bat calls from spectrograms in Germany. Maybe it also helps others with ID of bats in Central Europe.'lfu_nat_00378',BILDxCLASS:'Artikel',BILDxTYPE:'PDF')

Bestimmung von Fledermausrufaufnahmen und Kriterien für die Wertung von akustischen Artnachweisen - Teil 1

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.