Automatically add a spectrogram view to observations with sounds

Given that we can use the annotations tool to mark tracks, scat, etc. as the “evidence of organism”, would having a “calls/audio” option in the evidence annotation change your mind? This way, even if you add a spectrogram to an audio observation, you can still annotate it as being audio evidence. (I know I’ve found myself wanting an audio annotation as well as a bone annotation!)

3 Likes

Sound is evidence of the organism itself, it merely involves sound waves hitting our ears rather than light waves hitting our eyes.

It is already possible to filter observations by what type of media they contain (audio vs. image files).

1 Like

Filtering observations wouldn’t help people who want to post spectrograms of audio files, which is what the original post was discussing. By having an additional annotation, people can filter between images that are tagged only with “call” vs images tagged with other evidence. And the annotation I am discussing is called evidence of organism, so I completely agree that sound is evidence of the organism.

2 Likes

I understand what annotation you are referring to. I was not referring to the name of the annotation field, but to one of the possible values for that field – “organism”. The type of evidence for an audio file is still going to be “organism”, just the same way as if you posted a photo of the organism. The distinction made by this field is whether the organism itself was observed, or some other type of evidence left behind by the organism (a track, a fragment such as a feather or bone, a construction, etc.). With the possible exception of human constructions capable of making noise, a sound is made directly by the organism while it is present, therefore an audio observation is an observation of the organism itself. There is no separate option for “photo” as type of evidence for exactly the same reason, because this field does not refer to the media used for documentation.

2 Likes

Annotations are at the observation level, not the photo level (though photo-based annotations have been discussed a lot on the forum). I think most people agree that photo-based annotations would be great for several reasons, but implementing that seems to be a big challenge, so not coming soon.

4 Likes

Agreed. I recently started using the Birdweather PUC for field recording. It saves all my uploaded recordings to their server which can be filtered & viewed later by anyone. So, for each species detected, I upload to iNaturalist a processed (using the free Audacity program) version of the best detection and also provide a link in the notes to all the original recordings (complete with spectrograms). Seems to work well enough and iNaturalist isn’t storing all these files.
Something to consider. Example:

https://www.inaturalist.org/observations/283649598

Only downside - the PUC has GPS and records coordinates as well as date/time, but iNaturalist doesn’t read the metadata.

2 Likes

I would like to add my several cents to the discussion.

TL;DR
Spectrogram are not necessarily be stored in iNats DB as an image - it can be generated on the fly in user browser. I do it with my tool and it works in pure JavaScript in browser without additional server load. Either integration of a similar tool into iNats or allowing of automatic audio uploading by iNats may help.

Details:

My main of interest is reviewing existing iNats’s sound recordings on desktop/laptop. From that point of view the lack of visible Spectrogram is unfortunate. As pointed by @kueda just default spectrogram over the whole uploaded recording will unlikely to solve any issues, because the useful part of signal is frequently a tiny bit of the whole space in time and in frequency. Automatic cropping of that data is also not ideal - it depends on the object of interest (bats vs. birds vs. insects vs. mamals).

That said, I do not think, really significant ammount of automation is strongly needed to make the functionality useful. But having the ability to zoom in point of interest both in time and in frequncy is a must.

For my purpose I implemented my own open source application which generates spectrogram with some additional features: https://bansheelab.app/

Generate something like that in JavaScript is absolutely doable:

Things required are: Web Audio API, some FFT, some WAV format parser (not all required data is extractable via Web Audio API), some additional math implementatble without dependencies (e.g. window function like Kaiser window). WIth that everything can by done in pure JS. Which makes storing spectrogram in DB unnecessary - it can be done on the fly in user browser.

Ideally I would like to have it as part of details view, like we have for photo details (like this one: https://www.inaturalist.org/photos/488501855).

Having it as part of main obesrvation window like in script shared by @cigazze earlier in the thread, is interesting, but space is too limited to add reasonable amount UI to make the whole thing useful.

When I start looking into that, I thought I would do my standa alone tool, which would accept iNat’s observation id or link, automatically query and upload audio recording to make it seemless. I did not manage to do that. AFAIK, iNats is not allowing automatic uploading of audio data - if it detects that origin of the request is not the iNats’s site it redirects to html instead of wav. I did not find the way around that. I ended up in just storing wav from iNats manually and uploading it manually in my tool.

5 Likes

One more thing. Detailed view for audio files can also be beneficial for reading WAV integrated metadata (same as Exif for images). I do not think that typical bird recording devices store anything useful, but Bat recorders do. They follow GUANO format, specified here: https://github.com/riggsd/guano-spec/blob/master/guano_specification.md.
They frequently store location, elevation, recording device information, and even species guess, which was made by software:

In my experience deciphering these headers, they can be a bit messy (especially if UTF-8 symbols are used), but in most cases, it is manageable and useful.

1 Like

Some other things to consider from my point of view which I missed from this thread, in favor why there should be an iNat own spectrogram for audio:

Actual Compareabillity and visual representation of the sound which is not secured for user uploaded spectrograms

Every user that uploads the spectrogram from their favorite app uploads a screenshot, not the actual fourier transformation data a spectrogram incorporates. Some not even have axis legends or scale information at all. Not only the scale in X and Y axis is important, but very much also the time frame interval for the short time fourier transformation the spectrogram is made with (window size; also Hop length and number of frequency bins). When you use different parameters for your spectrogram you will get different patterns. So when the spectrogram is made by iNat on the fly, we would all talk about the same thing when we use the pattern for identification. Also having the fourier transform data, you could also have an interactive spectrogram where you can adjust the zoom level on the fly solving the problem with the axis ranges that was described earlier by staff.

The user uploaded screen shots of the spectrograms from other apps (like birdNet) often times contain suggested species photos as thumbnails

This is bad for several reasons. First, it is a violation of copyright (Already had a discussion with curators and some staff about this issue, who apprantly choose to ignore the problem rather than solve it, but thats another story). But it also introduces always the same thumbnails of different species (repetetive images) to the data stream of a species, which is much more difficult in terms of learning degradation than few random habitat shots.

2 Likes