Audio and CV suggestions

Question. Does the Computer Vision train on audio? If not, is there any plan to include audio training so that audio files can offer suggestions? As of right now, the audio files I upload never include suggestions from the CV making me think they are not currently included in the model.

No, it does not.

Many things have to happen first before this is possible. Audio recognition often uses similar models as image recognition, where the computer is trained on spectrogram images of the audio rather than “hearing” the audio. So the first step would be to have iNat generate a standardized spectrogram (proposed here; discussion of logistical hurtles can be read about in that thread). Then some form of moving-window recognizer has to be made because it doesn’t really work to have the computer look at the entire audio clip (it needs to take it segment by overlapping segment).

3 Likes

For bird songs and calls I rely on the Merlin app, although it doesn’t work everywhere in the world. I often play bird audio recordings I’ve made on my phone with the iNat app but I access the iNat record on my computer … with the Merlin app open on my phone I can get or double-check IDs that way from the computer audio. Kind of a neat workaround.

1 Like

I’m dealing more with frog calls, but same idea.

I think Australia has a Frog ID app but don’t know of one for North America, which is surprising. In the past I made an effort to learn the frog calls from an ID record album (on an LP record no less!) that Charles Bogert produced many years ago.

2 Likes

I have the album as well! Atlas Obscura wrote an interesting article on it recently:
https://www.atlasobscura.com/articles/column-sounds-of-north-american-frogs

There’s a lot of papers out there on using AI/ML approaches to IDing frog calls, but I don’t know of any “one stop shop” for North American calls in an easy to use app unfortunately.

3 Likes

Great article, thanks. I knew Bogert briefly in his later years in New Mexico.

The other album I have, on CD, is “Frog and Toad Calls of the Rocky Mountains” by Carlos Davidson. It covers more than just the Rockies and is pretty comprehensive for western North America.

2 Likes

Best place, by far, for frog calls in the U.S. is the U.S. Geological Survey site, which includes quizzes and lookups. However, it is limited to the eastern U.S.

3 Likes

xeno canto is suppose to add support for frogs sometime in the early part of this year. once that gets going, that might be a another place to get help with frog sound identification. i don’t think they’re at a point where they can do computational identification of sounds, but the community there is focused specifically on sound identifications.

1 Like

At least in North America, learning frog and toad calls is far less difficult than bird calls. Many fewer species to account for at any given location, and location is very important in narrowing down the possibilities. The most complex chorus I ever heard had about 6 species total. Of course in other parts of the world (e.g., tropics) it’s a different story.

1 Like

And the call repertoire is vastly lower in frogs.

1 Like