Audio recordings of bird song/calls

I could agree if the average person could understand spectrograms. But most birders would not be able to look at a spectrogram and “hear” the song in their head.

No argument here. That’s why iNaturalist accepts audio files. But I think what is being said here is that the percentage of audio observations is small compared with that of photo observations, and the amount of work audio files would require is disproportionate. So often I try listening to an audio file and I can’t make out any sound other than wind in the leaves. With a photograph, it could be as simple as cropping to focus on the bird hidden among the leaves; with an audio file, the “noise” is more difficult to “crop” out.

1 Like

Either is fine! But almost without fail the bird/frog/insect inevitably goes silent for a long period of time as soon as I start recording(:rage:) and I personally don’t want to post all that silence, so I often use the voice memo app, which allows me to trim the sound before uploading.

When you upload sound recorded directly in the app (iOS), if you’re not satisfied with the quality or length, you can download the audio file, edit it in audacity or whatever, and re-upload it. To do this, go to the iNat web site, right click on the audio playback display on the observation page to “save as.”

As an identifier, you can download other people’s audio this way and enhance/normalize it so that you can identify it more accurately (just as you can download other’s pics and enhance them).

1 Like

In my own work, Icterids make a call that sounds very similar to juvenile Black Rails. They are very hard to distinguish by ear alone, but I can more easily tell them apart with a spectrogram. Of course, I use the same spectrogram scale, and it would not work if I were using a scale I was unfamiliar with. That’s why I voted for the “automatically add a spectrogram view to observations with sounds” feature request. This would display all spectrograms on a standardized scale, and once identifiers got used to that scale, it could help them identify certain calls. Having a spectrogram auto-added has the added benefit of making it so people are far less likely to upload spectrogram images that mess with the CV (the idea is to have spectrograms for human IDers, not for the CV).

While possible, I still think observers should take on the bulk of responsibility for editing/vetting their audio. Audio is already IDed at a low rate, and identifiers are far more likely to skip a poorly edited clip than they are to download and enhance them. I generally think it is better to always edit your audio, but I’m ok with unedited audio as long as the observer confirms 1) that the call of interest can be clearly heard in the clip and 2) there are no loud and unexpected noises that can hurt an identifier’s ears if they aren’t expecting it.

I generally think observers should edit (crop) and vet their photos too (and people are far better about this than with audio). However, iNat has tools (ability to zoom or adjust brightness in identify mode) that help; no such tools for identifying audio exist.

My concern is less with people that only occasionally post an audio observation, but I would like to see more responsibility taken over observations by heavy audio observers.


This would be very helpful. I didn’t realize it was possible to do that. I guess I just wasn’t right clicking in the appropriate spot. I tried it again and it worked this time. Now I can pull it up in Audacity and be able to see the spectrogram. Thanks!

1 Like

The staff member quoted seems to make it clear later in the same thread that iNat does not have an official position of excluding spectrograms:

1 Like

I notice you refer to the first image. I wondered (genuinely), does it make a difference if a spectrogram is included as a second or third image of an observation?

Thanks very much for the link to the other open thread, that’s useful. From my point of view it would just be useful to have an annotation to indicate that the observation does not include any pics of the organism. That way we could include spectrograms and not introduce any error into the CV.

It is our understanding that only the first image of an observation is used to train computer vision. So adding the spectrogram as any other than the first image will not interfere with CV training

That’s incorrect. Only the first image is used when trying to ID your observation, but all the photos go into training.


That’s not gonna work with any bird that doesn’t have a single type of song, even starlings will have different spectograms unless you record the part that is used by all of them, then there’re species where each bird has totally different songs and are mimics of other birds.

Not necessarily. All photos are eligible for being used in the model, but only about 1000 per taxon are used in the actual training (provided there’s at least 1000 photos to use).


I find it alarming to hear that anyone would believe that my reason for using iNat is (or should be) to train the iNat Computer Vision model. I use iNat to contribute observations of organisms to a database so that the data will be available to all, and to participate in the community of others who like to do the same. I include photos that either help people identify the observation, or that add to our understanding/appreciation of that organism. Habitat photos and spectrogram photos are consistent with that. When deciding whether to include a photo, why would any naturalist choose to favor the iNat CV over improving our understanding/appreciation of the organism? I’ve searched many of the CV forum topics and haven’t seen any discussion about why iNat staff and some users are so rabid about prioritizing training the CV over the stated purpose of iNat:

“One of the world’s most popular nature apps, iNaturalist helps you identify the plants and animals around you. Get connected with a community of over a million scientists and naturalists who can help you learn more about nature! What’s more, by recording and sharing your observations, you’ll create research quality data for scientists working to better understand and protect nature. iNaturalist is a joint initiative by the California Academy of Sciences and the National Geographic Society. That’s the vision behind iNaturalist. So if you like recording your findings from the outdoors, or if you just like learning about life, join us!”


1 Like

Naw, it’ll work fine. Same way that the CV learns to recognize species that have strong visual polymorphism in sexes or life stages (e.g., monarch caterpillars and adults): it can learn to associate a given name with more than one image identity probability peak. And I imagine mimics would work about as well as visual mimics-- yes, it sometimes suggests bees for photos of syrphids, but with enough data it gives better results and some of that fuzziness in the data is ok.

There’s a lot of fearmongering throughout this thread about how allowing spectographs would mess up the CV. Balderdash. The CV will likely first just factor images like that out as noise, and then eventually when they get enough of a data set it’ll become as good as Merlin does visually analyzing the minutiae of spectographs. That’s what machine learning does. Same would go for habitat photos. Ten years from now when the CV algorithms are better, computation is faster, and data storage is cheaper, I think we’re going to be sorry we deliberatly throttled the data types we could have been gaining from citizen scientists.


This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.