Audio recordings of bird song/calls

Well noted. In fewer words, audio work requires more attention than photo. And if only for bird calls to be shared on iNat, it may not be really worth it.

I certainly think there is value in audio observations on iNat, but yes they do require more attention, or at least more practice. Many of us have plenty of practice with a camera and making minor photo edits, but many don’t have that same experience in audio. Neither is difficult to learn, just takes doing it right a few times.


No, iNat has a system for you to add your id faster, you shouldn’t use cv for adding ids you don’t know yourself.

1 Like

Please, please do not include spectrograms in the first “image” of a song upload. As @cthawley has indicated, this will really screw up the Computer Vision training for bird species (or other sound producers).
Also, I have to add another caveat: The Merlin app from Cornell University is pretty good and getting somewhat better, but it is still far from reliable for identifying simple call notes or for “documenting” unexpected species. There is an increasing trove of misidentified entries on eBird and iNaturalist of rarities and unexpected species with only the documentation “Identified by Merlin”. That won’t do it.


I quite agree though. Last recording I made was of an unidentified bird calling around a nest. I hoped the bird call recording would facilitate speedy ID.

I made a video with my phone, instead of my camera. Then extracted the audio (on Adobe Premiere Pro) to share on inat and SoundCloud. For myself, a multimedia editor, that is already tedious. As I would follow almost the same workflow for actual work.
I imagine how that would be for regular folk.

This doesn’t appear to be the case, since rather than images of the bird itself, iNaturalist seems to allow pictures of “feather, scat, track, or bone”. My point is, if the system can cope with these images as (indirect) evidence of the organism, why can’t it be made to cope with spectrograms?


I never argued that you don’t need to standardise spectrogram scales. I accept that if the CV were to recognise spectrograms them that would offer some challenges. But then so does the process of recognising organisms from images in lots of different formats, colours, perspectives etc.

But it seems to be designed to be told that lots of other images are not of the organism itself. What can’t spectrogram be added to this list?

I think spectrograms do show something unique about the bird that helps others learn to recognise it in person.

@buteobuteo2 Just FYI you can respond to/quote from multiple other posts in one response instead of making multiple short responses. This really helps keep threads more manageable.

If you are interested in discussing the possibility of including spectrograms as another a type of annotation, there’s an open thread for that:

However, this wouldn’t solve the issue, as annotations are properties of an observation, not of an individual picture. As such, they can’t really be used effectively for model training.

None of our discussion, however, or the potential of other/future models to use spectrograms, changes the facts that that the current CV model does not account for them and that staff have asked people not to upload spectrograms.


Thanks. I’m not really sure what you mean by more attention, but in many cases the presence of birds can be detected by their vocalisations when it would be impossible or very difficult to get a photo of them.

As for birdsong being unimportant or not worth the effort, I think we need to keep in mind that for many people, birdsong is their most common contact with wild animals. That seems to me to be important.


By attention I mean patience with the technology involved in recording (which is easier) and processing.

For photo, the availability of smart phones makes things easy. You may post a photo without editing, I mean SooC (Straight out of the Camera).

But audio, a different game. Have to record (with a mic, if you’re concerned about quality), and clip, clean or enhance on software like audacity, audition or Premiere Pro. A longer and more technical workflow. I’ve not seen any app yet that simplifies this process.

That’s my point!

Okay, I understand what you mean now, thanks for the clarification. But I’ve recorded all my observations of bird vocalisations with only a smart phone, with almost no editing. I don’t think I would have been able to record so many birds if I needed a lot of equipment or processing time.

I make the recording with the BirdNET app, which tells me (if it wasn’t clear already) if I’ve captured with any certainty a recording of one or more birds. I can clip the sound file with one swipe of the screen, and then share the WAV file to iNaturalist on the spot. I find this much easier to do than taking a photo of what are often very small, moving targets, obscured by vegetation, at several metres distance, and occasionally in low light.

I know lots of people like to take photos of birds and that’s great (I certainly like looking at them, and I admire the people who can do this), but i think we need to accept that an audio recording is just as valid an approach. Personally, i think both should be encouraged


Great to know about BirdNet app. I’ll explore it.

Your sentiment is valid. However, there’s a reason the majority sticks with photo. Moreso, birds aren’t vocal all the time.

In between dawn/dusk chorus schedules, and general calls - alarm, feeding, in-flight, etc, are odd, or significant moments of silence.

1 Like

Agree, I find that audio observations simply require a lot more work to create than photos.

I’ve been making an effort to post more audio observations – as someone with little birding experience who often struggles to see, much less photograph, the local avifauna, audio is often the only way I can start to try to make sense of what birds are around me.

But as useful as audio recordings are, there are a lot of rather time-consuming steps that have to happen before I can upload them. I rely exclusively on a cell phone for iNatting, and usually I make audio recordings by taking a video, which subsequently has to be edited to select the bits I want (and crop out the inevitable urban noises such as children playing, cars/airplanes, footfalls of joggers, etc.) and then converted from an mp4 into an mp3. Because this process means that there is no GPS data attached to the file, I then have to extrapolate where the audio was recorded and manually add the time and place to the observation.

Possibly there are recording apps that would eliminate a few of these steps, but the last one I tried used a format that still required conversion before uploading to iNat, and it didn’t save location data, so it ended up not being any less work than videos, which have the advantage that I can quickly switch between photo and video modes when I’m busy photographing bees and hear an interesting bird call.

I often have the additional issue that I have no idea what I heard, so it’s tricky to figure out whether there is only one singer or multiple individuals, or how to crop the file to keep the bits from one species together. And this is my particular hang-up, but I also tend to feel rather embarrassed about putting an ID of “Aves” on such observations and leaving it to the bird experts to sort out (especially when said bird turns out to be a Parus major which I have failed to recognize yet again). With photos I can usually narrow down the ID somewhat, either based on my own knowledge or with the help of the Computer Vision, but with audio I often don’t know where to start.

I’ll admit I haven’t tried any of the apps for recognizing bird song, mostly because they all seem to be based around the assumption that you are uploading recordings in real time. I prefer to go through my material at the end of the day or at some other later point in time. This is partly simply to limit data use and save phone battery power, but also because it allows me to focus better on what I am seeing/hearing at the time. Afterwards I can review my material at my leisure, decide what I want to upload, and research IDs before putting them on iNat.

…all of which means that the vast majority of my observations are photo-based. Well, and there’s also the fact that I’m most interested in bees and other insects, where – with a few exceptions – sound recordings typically aren’t of much use for ID purposes. (I actually often can tell in a general sense what sort of insect is visiting my balcony and even distinguish some of the bees based on the quality of the buzzing/humming, but this doesn’t really translate into useable audio material.)

1 Like

This is also a bit of pain, but before or after you record you can always make an observation in the app and choose “No Media”. That will capture the date/time/location of the recording, and you can add the sounds to the obervation later. I do that fairly often.


See, that would imply having some degree of certainty that I am in fact going to make an observation out of any particular recording, which is usually not the case – I’ll often record a video or two when I hear something interesting, but when I listen to the recordings afterwards, a certain portion turn out to be too quiet or there is too much background noise for me to hear anything clearly (or I manage to figure out that it is just another Parus major), so I actually only upload a fraction of what I record, and sometimes this happens weeks or months after the fact.

The videos have timestamps and I generally have other photos from shortly before or after that I can use to get the location, or else I can recognize where I was based on the images in the video, since I do most of my observing within walking distance of home/work, but this of course isn’t quite as quick and convenient as simply letting iNat grab the information from the photo metadata and clicking “upload”.

1 Like

This covers the entire subject matter of audio observation :)

1 Like

I am grateful to the Identifiers who work with sound observations; they’ve broadened my awareness of the birds around me.

I’m not bird song savvy at all. But, I started making the occasional bird recording over the last two years or so.

The INaturalist iOS app sound observation facility makes it very easy to upload a sound observation, but does not allow for any editing at all - no trim, no enhancement.

Before that was introduced, I used iOS Voice Memo to record sounds, which allowed me to trim the recording and there was an enhance feature, which seemed to work pretty well. But, uploading those recording to iNat involved jumping through several hoops.

I’ve been using iNat iOS app for sound recordings for the ease of use, but I wonder if I should not?

1 Like

My camera doesn’t record GPS so sometimes I will take a quick usually poor quality shot with my phone to record it, but often it gets it wrong (don’t have photo app open long enough?) so I end up having to look at satellite images to figure it out. I will usually be doing an eBird list at the same time which records my track so I don’t have to worry about forgetting where I was (approximately) when I recorded the audio when I upload it at a later date (I usually do audio uploads in batches). This also helps put my photos in the right place if my photo app on my phone fails to get the right location and I can’t tell where I was by looking for landmarks on the satellite image. If I don’t do an eBird list I will do a quick audio file recording myself saying where I am.

I more often record audio with my phone because the app I use will put both the date and time in the file name. If I record a video with my camera I have to change the file name before converting to audio so that data isn’t lost. If I’m uploading multiple audio files from the same day I will select all on the upload screen and put in the date and time from one of the files then go back individually to the others to slightly edit the time. Goes much faster than putting in the date/time for every file.

I suggest making longer audio files and don’t worry about cropping. Just give a timestamp of the sound you are interested in and we can tell if other sounds in the file are from the same bird or not. Often if there are more species in the file someone who IDs the one you want will say so in the comments. Then you can duplicate if you want and put in the notes “not the such-and-such bird”. Such-and-such being the name of the bird IDed in your original obs. Don’t be embarrassed. Lots of people don’t know bird calls, even of very common birds. I only started learning them because I was having problems with dizziness when looking up to see birds. And there are still plenty I don’t know.

I have only used the birdsong apps at home by playing the audio file on my computer and running the app on my phone against the playback. Then I look at the results it gives me and don’t save the file on the app.,

1 Like