Automatically add a spectrogram view to observations with sounds

michaelpirrello · June 23, 2020, 9:35pm

Maybe have an “Evidence of Organism” toggle for everything that’s not a photo of the animal itself, and then exclude that from CV learning?

jwidness · July 29, 2020, 1:05pm

11 posts were split to a new topic: Question about evidence of human activity

upupa-epops · July 30, 2020, 3:55am

I don’t have experience processing sound but I’m wondering if bat recording observations are rare enough relative to other kinds of sound that it could default to eBird-like settings and then have a button to show the full spectrogram? (analogous to the zoom/brighten buttons for images) I’m not sure if insect sounds are more similar to birds or bats.
I guess one potential issue with that is if it means you have to save two images for each recording.

jrcagle · March 19, 2021, 7:26pm

@kueda: My opinion based on somewhat limited knowledge of ML and computer vision is that you already have that problem with “garbage photos” - ones that are misidentified, ones that have insufficient resolution or clarity, ones where the specimen is very small, ones where multiple species are depicted. Yet the algorithms handle it. Suggestion: if a particular image comes back with very low likelihood of being the depicted species, have it autoflagged as “identified species not visible” so that the image is not used, or is used properly, as part of a training set. Then we could include hostplants, spectrograms, weather conditions - parts of the field notes that are desperately needed.

Imagine a world where the spectrogram could be used by AI for bird ID, instead of what happens now – one posts a sound recording, and within a couple of years someone else finds it.

schoenitz · March 21, 2021, 4:58pm

Not to beat a dead horse, but I thought the AI response to my sparrow song was funny:

Like it or not the AI is picking up on spectrograms, and as of now bats beat birds in the spectrogram-to-photograph ratio. Perhaps “spectrogram” could be established as a “pseudo taxon” that the AI learns to recognize, and then never has to suggest.

pisum · March 21, 2021, 6:01pm

my experience is that birds are identified fairly quickly on iNaturalist, if they can be identified easily, even if the only evidence is audio.

BirdNET – The easiest way to identify birds by sound. (cornell.edu)

there are a couple of videos on that page that give some basic explanation of how they do their thing. when asked if their algorithm could be adapted for animals other than birds, the answer is “maybe… other animals are using other frequency ranges than birds, and it gets more challenging for insects who are using higher frequencies, and it gets more challenging for bats… you need more specialized equipment for that [rodents, bats]… you can’t use your phone for that…”. apparently, you could theoretically leverage their open source code or even hook into their API to develop your own apps.

to me, it seems like it would be a lot of work for relatively little benefit to develop something specifically for birds in iNat, considering other things already exist. if you’re going to develop something that can cover any organism, then that might be interesting, though probably exponentially more challenging.

also, at least right now, the number of observations with sound in iNaturalist is not very large – so there’s not necessarily a lot of data to train on. currently, there are only a little over 144,000 observations with sounds (mostly birds), representing just under 6,000 species. but if you look at how many of these species have more than 100 observations, you’re sitting at around 270 species, and if you limit that to just research grade, you’re down to around 240 species.

a lot of the data submitted is not even actually audio, since spectrograms are technically images, not audio exactly…

antrozousamelia · April 19, 2022, 9:54pm

I like this idea a lot. Having spectrograms attached is really useful, and if displaying them from the .wav file isn’t possible, this seems like a reasonable solution.

ahospers · April 20, 2022, 8:53pm

I dont know if Tadarida Toolbox makes spectograms but it can tell you which species the sound contains. I thougt it was used for online websites in UK en FR.

https://openresearchsoftware.metajnl.com/articles/10.5334/jors.154/

https://github.com/YvesBas/Tadarida-C/commit/145d84f7fc57581733a8bef335ea7dfebaf9b9e3

antrozousamelia · April 21, 2022, 6:10am

While this is interesting, I don’t have an enormous amount of trust in automatic sound classifiers (background: I spend about 8 hours a day during the non-field season manually vetting the outputs of the SonoBat classifier and it’s correct some of the time, correct but not confident some of the time, and plain wrong some of the time).

Is your suggestion that iNat implement code for doing “computer vision” on .wav files? That would be interesting, but would still necessitate people confirming IDs to download the .wav file, look at it in a sonogram/spectrogram viewer, and then come back to iNat to add their ID. I often end up doing this when a .wav file is uploaded anyway, but sometimes that step isn’t necessary, and when it isn’t you’ve saved several minutes of steps and therefore can make more IDs. There’s a massive backlog of potentially identifiable acoustic bat observations on iNat, and very few people who both have the expertise and time to work on them. Making it faster and easier to do could help with that.

ahospers · April 21, 2022, 11:37am

You need only a few mammals, crickets and birds so the hit ratio was high.

hyrumbaker · August 23, 2022, 8:40pm

If iNaturalist does make support for spectrogram viewing maybe the spectrograms could be used to train the normal computer vision model instead of making a whole new computer vision model for sound.

lordcaravan · August 24, 2022, 11:30pm

Audio is for your ears.

Light for your eyes.

alexanderr · September 3, 2022, 7:42am

I don’t think it would work very well. Most sounds in observations are very poor quality, have lots of noise, and are very long (not to mention the frequency range of interest). If you automate a spectrogram then you will not be able to pick out the important bits in most observations. Why not, as the observer or identifier, just edit and create the spectrogram yourself (that’s what I do), and then I upload it as an image (or download the sound if you are identifying). Audacity is free and very easy program to use for this.

jrcagle · September 4, 2022, 3:15pm

The Merlin app is remarkably good at ID by cruddy wavefile - better, in fact, than by photoID. The reason, I think, is that it picks out specific frequencies in a 1D FFT and maps them to a relatively small set of signatures. Thus, a lot of the noise is irrelevant.

doug_grinbergs · January 24, 2023, 6:43pm

Indeed. Audio recording useful for visually impaired, spectrogram useful for deaf person. Observation with both photo, recording and spectrogram of use to most people.

pisum · February 26, 2023, 5:35pm

i don’t think you should crop the spectrogram exactly. i think the best thing to do is something like what Audacity does. it displays a default spectrogram frequency range up to ~20kHz (since human hearing is generally described as 20Hz to 20kHz) and allows the user to “zoom” in an out on that range, up to the frequency range captured in the file.

so in my mind, the ideal interface would allow the user to specify the min and max frequency (with default 0 to 20000 Hz) and the type of scaling (i.e. linear, logarithmic, etc.). this kind of interface would be dynamic, showing you the detailed spectrogram for a given window of audio, and that window would move as the audio was played. something like this:

if you need to save default snapshots of the spectrograms so that you can deliver visual previews of the sounds easily, then i think the best thing is to have a handful of standard configurations, similar to how you have a handful of standard configurations for delivering photos. these preview spectrogram images could all be the same height and width – say 150px by 50px. and your standard configurations could be something like:

human hearing range (20Hz to 20kHz), logarithmic scale
human hearing range (20Hz to 20kHz), linear scale
full range of audio file, logarithmic scale
full range of audio file, linear scale

then if the user needs to see more detail, then they could look at the observation detail, and the player there would dynamically provide more detail, as described above.

…

unfortunately, it doesn’t look there are many easily found modules that would generate dynamic spectrograms for a given window of time. so in the absence of something like that, i don’t know if it makes sense given limited resources to approach the interface the way i’m thinking.

this looks interesting, but it looks like it’s no longer in development: https://github.com/miguelmota/spectrogram

this looks interesting, but it generates spectrograms from the microphone: https://github.com/borismus/spectrogram.

wf_inaturalist · August 25, 2023, 7:09am

That seams to be the easiest way for now - have a category, or could we use “tracks and signs”? I really don’t see the difference to scat etc

arkyar · January 19, 2024, 10:07am

It is possible to generate and show spectrograms directly using Web Audio API.

https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API/Visualizations_with_Web_Audio_API

Perhaps creating a module that would read the audio data and show interface from audio files on the fly would be interesting. Also providing interactive elements to change frequency and zoom would make visualization of the audio files much more useful than static spectogram images.

cigazze · January 25, 2024, 6:40pm

I have a somewhat clunky Greasyfork script I wrote that generates spectrograms on iNat. It’s ugly, the controls are a bit hard to use, and the spectrograms take as much time to generate as the audio is long (e.g. a 1 minute audio clip would take 1 minute to generate a spectrogram) but it’s something.

This is using a library I found called spectrogram.js but I’m definitely going to try to put together a better script from scratch at some point!

https://greasyfork.org/en/scripts/482904-inaturalist-spectrogram

Paulbeausoleil · April 23, 2025, 2:17am

Cool ideas here! Would definitely love to see spectrograms.

Perhaps these plots could be made by default when a sound has a taxon ID tagged at least ‘Aves or birds’ ? This would make it easy to implement rather than making it work for all organisms at first.

Or perhaps make a quick data analysis on the audio to crop the data is less than a threshold?

I took a look at the spectrograms and got this:

This would be done only to the image not the actual audio file.

But I agree that cropping might not be the solution : a change of scale could probably be automatized based on the taxon ID information.

For the squashing of the image (depending on the length of the audio file, so longer song files would be more squished), this could be solved by showing a fixed width, but the audio pans under a fixed cursor. A bit like Merlin does in the app.

Thoughts?

Topic		Replies	Views
Audio recordings of bird song/calls General	54	4028	May 9, 2023
Bat Observations including Sound and Spectrograms General	7	1732	April 27, 2023
Great Resource for Us Beginners at Audio Recordings General	7	1006	August 7, 2022
Sound recording in app General	8	646	May 18, 2024
Audio Observation Workflow General brainstorm	8	2101	June 30, 2020

Automatically add a spectrogram view to observations with sounds

Related topics