Automatically add a spectrogram view to observations with sounds

Given that we can use the annotations tool to mark tracks, scat, etc. as the “evidence of organism”, would having a “calls/audio” option in the evidence annotation change your mind? This way, even if you add a spectrogram to an audio observation, you can still annotate it as being audio evidence. (I know I’ve found myself wanting an audio annotation as well as a bone annotation!)

Sound is evidence of the organism itself, it merely involves sound waves hitting our ears rather than light waves hitting our eyes.

It is already possible to filter observations by what type of media they contain (audio vs. image files).

Filtering observations wouldn’t help people who want to post spectrograms of audio files, which is what the original post was discussing. By having an additional annotation, people can filter between images that are tagged only with “call” vs images tagged with other evidence. And the annotation I am discussing is called evidence of organism, so I completely agree that sound is evidence of the organism.

I understand what annotation you are referring to. I was not referring to the name of the annotation field, but to one of the possible values for that field – “organism”. The type of evidence for an audio file is still going to be “organism”, just the same way as if you posted a photo of the organism. The distinction made by this field is whether the organism itself was observed, or some other type of evidence left behind by the organism (a track, a fragment such as a feather or bone, a construction, etc.). With the possible exception of human constructions capable of making noise, a sound is made directly by the organism while it is present, therefore an audio observation is an observation of the organism itself. There is no separate option for “photo” as type of evidence for exactly the same reason, because this field does not refer to the media used for documentation.

Annotations are at the observation level, not the photo level (though photo-based annotations have been discussed a lot on the forum). I think most people agree that photo-based annotations would be great for several reasons, but implementing that seems to be a big challenge, so not coming soon.

Agreed. I recently started using the Birdweather PUC for field recording. It saves all my uploaded recordings to their server which can be filtered & viewed later by anyone. So, for each species detected, I upload to iNaturalist a processed (using the free Audacity program) version of the best detection and also provide a link in the notes to all the original recordings (complete with spectrograms). Seems to work well enough and iNaturalist isn’t storing all these files.
Something to consider. Example:

https://www.inaturalist.org/observations/283649598

Only downside - the PUC has GPS and records coordinates as well as date/time, but iNaturalist doesn’t read the metadata.

I would like to add my several cents to the discussion.

TL;DR
Spectrogram are not necessarily be stored in iNats DB as an image - it can be generated on the fly in user browser. I do it with my tool and it works in pure JavaScript in browser without additional server load. Either integration of a similar tool into iNats or allowing of automatic audio uploading by iNats may help.

Details:

My main of interest is reviewing existing iNats’s sound recordings on desktop/laptop. From that point of view the lack of visible Spectrogram is unfortunate. As pointed by @kueda just default spectrogram over the whole uploaded recording will unlikely to solve any issues, because the useful part of signal is frequently a tiny bit of the whole space in time and in frequency. Automatic cropping of that data is also not ideal - it depends on the object of interest (bats vs. birds vs. insects vs. mamals).

That said, I do not think, really significant ammount of automation is strongly needed to make the functionality useful. But having the ability to zoom in point of interest both in time and in frequncy is a must.

For my purpose I implemented my own open source application which generates spectrogram with some additional features: https://bansheelab.app/

Generate something like that in JavaScript is absolutely doable:

Things required are: Web Audio API, some FFT, some WAV format parser (not all required data is extractable via Web Audio API), some additional math implementatble without dependencies (e.g. window function like Kaiser window). WIth that everything can by done in pure JS. Which makes storing spectrogram in DB unnecessary - it can be done on the fly in user browser.

Ideally I would like to have it as part of details view, like we have for photo details (like this one: https://www.inaturalist.org/photos/488501855).

Having it as part of main obesrvation window like in script shared by @cigazze earlier in the thread, is interesting, but space is too limited to add reasonable amount UI to make the whole thing useful.

When I start looking into that, I thought I would do my standa alone tool, which would accept iNat’s observation id or link, automatically query and upload audio recording to make it seemless. I did not manage to do that. AFAIK, iNats is not allowing automatic uploading of audio data - if it detects that origin of the request is not the iNats’s site it redirects to html instead of wav. I did not find the way around that. I ended up in just storing wav from iNats manually and uploading it manually in my tool.

One more thing. Detailed view for audio files can also be beneficial for reading WAV integrated metadata (same as Exif for images). I do not think that typical bird recording devices store anything useful, but Bat recorders do. They follow GUANO format, specified here: https://github.com/riggsd/guano-spec/blob/master/guano_specification.md.
They frequently store location, elevation, recording device information, and even species guess, which was made by software:

In my experience deciphering these headers, they can be a bit messy (especially if UTF-8 symbols are used), but in most cases, it is manageable and useful.

Some other things to consider from my point of view which I missed from this thread, in favor why there should be an iNat own spectrogram for audio:

Actual Compareabillity and visual representation of the sound which is not secured for user uploaded spectrograms

Every user that uploads the spectrogram from their favorite app uploads a screenshot, not the actual fourier transformation data a spectrogram incorporates. Some not even have axis legends or scale information at all. Not only the scale in X and Y axis is important, but very much also the time frame interval for the short time fourier transformation the spectrogram is made with (window size; also Hop length and number of frequency bins). When you use different parameters for your spectrogram you will get different patterns. So when the spectrogram is made by iNat on the fly, we would all talk about the same thing when we use the pattern for identification. Also having the fourier transform data, you could also have an interactive spectrogram where you can adjust the zoom level on the fly solving the problem with the axis ranges that was described earlier by staff.

The user uploaded screen shots of the spectrograms from other apps (like birdNet) often times contain suggested species photos as thumbnails

This is bad for several reasons. First, it is a violation of copyright (Already had a discussion with curators and some staff about this issue, who apprantly choose to ignore the problem rather than solve it, but thats another story). But it also introduces always the same thumbnails of different species (repetetive images) to the data stream of a species, which is much more difficult in terms of learning degradation than few random habitat shots.

I’m still making my way through all the various postings on the forums about spectrograms on iNaturalist, but in the meantime, as a proof-of-concept (and if desired, a basis) for having them built into the website, I have created a Google Chrome Extension that does just this:

https://chromewebstore.google.com/detail/inatspectro/dkcpffpppiggohlejjcoafbhhdcmnapc?authuser=0&hl=en-GB&pli=1

Once installed in your browser, viewing observations with audio attached will show a spectrogram like this one for an observation of a bat call:

Clicking the little cog reveals a number of settings that can be adjusted also.

Feedback is welcome! If this is something the iNaturalist team would like to incorporate, consider it my gift to help get started.

@japh
Tried your plugin on my observation: https://www.inaturalist.org/observations/272459308
I’ve got strange spectrogram:

The expected result is something like that:

I’ve tried adjusting the parameters, but it did not improve the results. I believe you perform the transform on the entire sequence and then zoom into the available transform. However, zooming in on the time dimension is likely to require redoing the FFT, because a broad overview and a detailed view of a small segment are better when used with different FFT sizes and windowing heuristics.

As a prototype concept, there are essentially two key parameters - time and frequency zoom and that should suffice for grasping view. All other adjustments fall outside the scope of the iNat and should be handled by a dedicated (preferably web-based) software package.

And a detailed view would certainly require quite a few controls. Many sound recordings on iNaturalist are problematic: they are not only noisy and include multiple species, but often contain unexpected artifacts. For example, I’ve seen time-expanded WAV recordings uploaded without any indication of that. Handling this variety will require a substantial set of interface controls.

Thanks so much for this specific feedback, by the way! I released an update that improved things a little with the resolution, but I will continue to work on it as it’s still not quite there yet.

This is super cool (and the documentation is excellent), congrats! I’ll give it a try.

Thank you so much for that feedback! Please do let me know how you find it when you try it :)

I just found out that besides UK in French there is now a pipeline available:
Probably they soved their issues (which took years)

Sound analysis pipeline

This pipeline accepts sounds in the following formats :

  • compressed or not in zip, tar, 7z, simple or multipart archive
  • WAV, RAW, WAC and T.WAV (Audiomoth)

The pipeline then does the following steps :

  • Extraction (unzip)
  • Conversion to WAV
  • Putting the date and time in the name of the file (using the XML or TXT files)
  • Expand x10 and cut in 5s (format required in Tadarida automatic ID software)
  • Create site and date in the database
  • Add site in file name
  • Run Tadarida classifier to identify the species

When I’m finished with the sounds of Laureen and Nathan, I will start processing the files of the partners. I also have to add the handling of .ta files (sound parameters obtained with Tadarida) in the pipeline for partners who sent this format.

https://b2drop.eudat.eu/s/dKsjLyX28NrFp2j

https://b2drop.eudat.eu/s/zQFL3sAyX9y73An