Audio recordings of bird song/calls

I think many people are discouraged from uploading audio, because audio-only observations are identified at a much slower rate than observations including photos.

From the other point of view, many people are discouraged from identifying audio, because many observers don’t vet their audio as carefully as they do photos.

  1. Many people know how to crop photos on their phone/computer, but people are less familiar with audio editing software. So many people don’t edit their audio clips.
  2. Unedited audio clips are harder to ID than unedited photos. I can zoom in on an uncropped photo, but I am much more limited in my ability to normalize, amplify, etc. other’s audio observations.
  3. Audio very often includes multiple, sometimes many, species, and observers are lax about including notes regarding the audio of interest (i.e., describing the call of interest, but preferably listing the seconds where the call can be heard in the audio clip).
  4. Often, audio clips are cut too short. People are impatient and instead of including a 30 second clip where the vocal of interest can be heard multiple times, they upload a 5 second clip where observers cannot fully wrap their head around the sound before the clip ends.

Edit your audio! I use Audacity software. The cropping and normalize tools are worth becoming familiar with.


Best Practices for Audio Observers:

  1. When possible, try to include multiple sets of a vocalization in a single audio clip. Try to avoid 5-second clips with only a single set.
  2. Unless it is readily apparent (i.e., only one species can be heard in the audio file), include notes that indicate which vocalization is the one of interest. The best way to do this is to list the seconds in which the vocalization can be heard.
  3. Crop your audio clips. At the start and end of audio clips, there is often a lot of noise as you fumble with your phone to hit the right buttons. Please crop these loud noises out to save the identifiers’ ears. This will also be important for when you normalize the audio.
  4. Boost the volume (normalize)! Adjust the level of each recording so that the loudest sound reaches -3 dB. I do this in Audacity audio editing software (free). If you can’t clearly hear the vocalization of interest in your audio clip, an identifier can’t either.
  5. Add an ID. Even a general ID of “birds” or “frogs and toads” will get an observation identified faster than leaving it at unknown.

Thanks, these are all interesting points.

The way I’ve been doing it is to use a birdsong recognition app on my phone (currently using birdnet), and then posting the audio file and a screenshot of the spectrogram on iNaturalist. This has the disadvantage that Birdnet only allows an audio clip of up to 15 seconds, but it has the advantage that all of my observations have a suggested ID. Possibly as a consequence, of more than 200 observations with audio, only 1 is not research grade.

Just looking at the stats more generally though, you both seem to be right about the number of unverified birdsong recordings. In the US and UK, it seems to be running at about a quarter of all bird observations with sounds which are ‘Needs ID’, compared to less than 5% for bird observations in general.

I wonder if more people could be encouraged to use an audio recognition app for the initial recording. Machine learning seems to have made this so much easier in recent years.

For anyone interested, here’s the link to Birdnet

I sometimes record a video of a bird with my camera and then convert it to an audio file. It does a better job than my phone. I do use the Merlin app from time to time to identify birds when I am out walking. It also “hears” better than me. It will often display a bird I am not hearing. I then take the time to listen more closely to the surroundings. And, the app tells me what to listen for.

Since you are interested in songs/calls, you might be interested in this website.

The Merlin app is pretty good. It has a decent amount of variation of songs/calls for species. The Audubon app is also very good. I am using these apps for North American (Ohio) species. But, if you really want to delve into variation, the xeno-canto website has a lot more.

I know one birder who had a night recording setup on the roof of his house. He posts a lot during migration. I think it is really cool that so many birds fly over us. Here is his profile if you want to see what he’s up to.


I hate when I have the volume all the way up with headphones in straining to hear the bird they want IDed and loud Carolina wren blasts into my ears usually causing head pain. Kind of puts me off of IDing audio for a while. I’ve been told we should not put up the spectrogram as an image but if people did this it would make it so much easier to skip the parts I know will be uncomfortably loud.

1 Like

In general, I would not suggest using the output of another AI or CV model as an identification for iNaturalist observations if it isn’t based on your own expertise. iNat asks for observations to be based on user’s own knowledge/experience. For the iNat CV, this input is denoted in the observations, and identifiers can take this into account, but this isn’t the case for other machine learning approaches. There are other discussions about this approach on the forum in relation to Merlin as well.

If you do use another machine learning approach and use its suggestion as your own ID, please be explicit about the process in your notes/descriptions. Other machine learning approaches can definitely generate erroneous suggestions in many cases.

Additionally, as @lappelbaum noted, you should not post spectrograms as images (even though they can be helpful for IDs). Each image on iNat needs to contain the organism observed. One important reason for this rule is that images are used for training the CV model, so including spectrograms could lead to errors in the CV model. You could post external links to an image hosting service like imgur or others if you want identifiers to have access to a spectrogram.

1 Like

Birdsongs are ided pretty well, I have less than 1/10 in need of id, definitely people shouldn’t use other apps as a means to add their own id, you can compare your sonogramm with what is presented by xeno-canto, but your id should be yours and not of an app.

Just to be clear, I don’t suggest an ID just because it’s put forward by Birdnet, but rather i suggest it because it’s what i believe the bird to be, based on my knowledge and experience. I have on several occasions contradicted the suggestion put forward by Birdnet, as have other people offering IDs.

In the circumstances, perhaps I need to reconsider whether I can continue to contribute to iNaturalist. Thanks in any case for all your comments

I don’t think it would lead to errors in the CV model…
Rather the opposite - with enough spectogram images, it would just start to be able to ID via spectogram?


Not without standardizing the scale, color, and other display aspects of the spectrograms. An uncropped photo of a rabbit still looks like a rabbit (as long as it isn’t so far away that it’s just a dot in the photo). A spectrogram looks radically different depending on the scale you use. Additionally, people may use different color scales (grayscale will look different from inverse grayscale which will look different from various color scales).

Edit to add: there is a feature request (Automatically add a spectrogram view to observations with sounds) that if executed, would produce standardized spectrograms for audio when uploaded.


If a spectrogram looks radically different depending on the scale you use, surely the same is true of a photographic image of a bird?

The proportions of a bird in a photo remain the same regardless of zoom because your x-axis must change with your y-axis. This is not true for a spectrogram.


If your id should be yours and not of an app, why does iNaturalist offer a facility to recognise an picture of an organism… based on an app?

If we assume that the proportions of a bird in a photo remain the same regardless of zoom, doesn’t that assume that we are all taking a photo of the bird from the same angle?

How does changing the angle change the proportions of the bird? It may change what parts of the bird are visible, but the proportions remain the same.

It seems to me that, viewed from different angles any object will appear to have different proportions… it’s sometimes known as perspective (or sometimes in art, ‘foreshortening’). Any machine (or person) has to take perspective into account before the actual proportions of an object can be visualised

Perspective and angle are not the same as the proportion of an object. Changing perspective changes how you see the object; changing proportion changes the object. In a photo, the scale of the x-axis is always the same as the scale of the y-axis (though the “range”, number of pixels, may differ). Yes, different angles may make the object look different, but the number of angles are functionally finite (technically infinite, but a photo from 0 degrees will look functionally the same as a photo from 0.1 degrees). If the proportions of an object are not fixed, there are infinite ways to display that object with noticeable differences in appearance.

This would be a change in proportion of an object:


Saying you don’t need to standardize spectrogram scale is like saying the CV should recognize a photo no matter how much you stretch it. Take it from someone who’s job is building automated recognizers for audio through spectrogram characteristics.

Given the way that the iNat CV model works, this wouldn’t really be possible. It is certainly possible to train a machine learning model to recognize spectrograms (this is what Merlin does for instance). But that model is a separate one from their photo ID model. Both are trained on their specific class of inputs.

iNat’s model doesn’t know that the picture uploaded is a spectrogram - it will be treated the same as other pictures of the organism. So adding spectrograms will introduce unnecessary error into model training. An additional issue is that spectrograms would need to be produced in a consistent manner to allow for comparisons (again, this is what Merlin et al. do). Using unstandardized spectrograms, even in a spectrogram specific model, would lead to poor results.

If you want a more “official” take on spectrograms:


Well noted. In fewer words, audio work requires more attention than photo. And if only for bird calls to be shared on iNat, it may not be really worth it.

I certainly think there is value in audio observations on iNat, but yes they do require more attention, or at least more practice. Many of us have plenty of practice with a camera and making minor photo edits, but many don’t have that same experience in audio. Neither is difficult to learn, just takes doing it right a few times.