Recognize sounds automatically

Firstly, thank you for the amazing apps - I am a teacher and tried over 15 apps for plant identification for my students, yours won hands down. Nearly all my students are now using Seek on a daily basis and show me the results - it is changing their lives.

We would love a feature that would use a phone’s microphone to identify bird song (and other animal calls, of course). It can, of course, be difficult to get close to birds to photo them. By adding a database of birdsong and allowing permission to use a phone’s microphone, it would circumvent that problem. It would be a very good feature for allowing the identification of bird populations.

What do we think?

why stop at birds? lots of other things make noises. i think curated sounds, like curated photos, might be nice on taxon pages, too.

1 Like

That’s awesome! Would love to hear more about that if you want to share. Maybe in this discussion?

Just to make sure I understand you, you’d like to be able to have iNat/Seek suggest IDs from the call of an animal like it does for photos?

I’m not a developer, but I know from discussions we’ve had that sound is more difficult than still images because time is involved - it’s not a static thing. One workaround might be to make spectrograms of a species’ call and then train the computer vision on that image, but it’s still more complicated than image recognition and we’re still in the nascent stages of image recognition. So, definitely on the table but this probably wouldn’t implemented anytime soon.

Note that you can record a sound and upload it via our web uploader for other users to ID, and we are working on adding sound features to the Android app.


I would have thought that image recognition is far more difficult in that you have a 2 dimensional image of a 3 dimensional scene, and separating the subject from the background is the first problem. Then you have to be able to recognise the subject from a huge range of angles and lighting conditions. And diagnostic features may not even be visible in a particular photo, even if it is high quality.

With sound, recordings can easily be quite directional, and there are many features of bird songs and calls that separate them from the background (repetition, bursts of ‘pure’ tones). And continuous background noise can often be removed by analysing the ‘quiet’ patches between notes. Certainly frequency analysis would be part of identifying calls, but feeding spectrograms into a computer vision system actually seems rather backward as it would only be looking at one element - a bit like identifying photos just from colour analysis. [just been reading more about spectrograms, and maybe this approach has possibilities after all]

1 Like

How do those apps which identify a song or even a movie from just a small sample of the sound work? Perhaps those technologies would work for bird songs?

1 Like

here’s the paper that describes the first major implementation:

here’s someone who describes how to code this:

i would guess that this algorithm might be too sensitive to variations of a call between individuals and even different instances of an individual’s own calls and doesn’t account for some of the subtleties that make one sound different from another. (it seems to focus on the pattern of changes in tone. it might be able to differentiate between the studio version of a song and a live performance, but it might not be able to tell you that they were performed by the same band.)


I am an amateur music composer producer/editor and I have always been impressed by how much longer everything takes with sound than with image, because time has to pass. So tiwane’s suggesion of spectograms made immediate sense. Much as I love sound. And especially Nature sounds like bird calls and streams. (I once made a short movie in the bush, and edited the soundtrack separately, as one does. I had a heap of recordings of different parts of the stream and it was fascninating to discover that I recognised the stream sound at different places).


Yes we aren’t trying to match against one recorded instance of a song, each rendition is more like a jazz improvisation of a community standard. And of course there are some birds that go out of their way to mimic other birds, or environmental sounds (car alarms even), and most species have local variations of the theme anyway. But I’m sure that I’ve heard of applications automatically IDing bird songs … I better go have a look … (must be easier than speech recognition :-)


yes, i’m not saying it couldn’t be done. i’m just saying the way that music recognition apps do it might not work. do you know of any apps that are particularly good and how they work?

i think speech recognition is probably easier because you’re dealing with a finite set of phonemes and how they’re strung together to form words and phrases. with a generic sound matching app, there are too many subtleties to try to match on, i would think. if you limited your recognition app to, say, just bird calls, that might be easier though.

1 Like

It is almost certainly technically possible, whether it’s possible to make something that is accurate enough to be useful in a reasonable amount of time is another matter.

There are bat detector which will automatically identify bats so generating IDs from sounds is clearly possible. “The Echo Meter Touch 2 currently covers bats found in North America, the Neotropics, U.K., Europe and South Africa.” I haven’t used it and obviously it only identifies a limited number of species (bats) and I would guess background noise will be less of a problem in the ultrasonic spectrum.


A search for “automatically identify bird songs” gets lots of apps, and adding “open source” to that search produces a few really interesting projects. Here’s a couple that perked my interest:

And there’s even mention of software from my local university called Weka.


I use BirdNET which is useful at times for Identifying birds by noises. It would be a nice addition to make sound ID’s more common. I already use Sound to ID animals but there are fewer verifications/takes longer to be verified. It might make sound ID’s verified more often as people can compare the noises more easily.

1 Like

I’m going to close this request. It’s on our radar and might eventually be added if feasible, but I think we’'ll focus on improving the image recognition model before moving to sound.