Suggest ID for sounds?

alex · November 17, 2020, 9:03pm

I have to admit I know very little about sound recognition. Here are some very disorganized thoughts about a few audio species classification projects that I know of.

Birdvox (https://wp.nyu.edu/birdvox/) is kind of like microphone trapping for bird migrations. They’re trying to fill in radar data (which can give information about migrating biomass but nothing about species) to understand bird migrations.

Rainforest Connection (https://www.rfcx.org/our_work#monitoring) is mostly looking at significant audio events, such as detecting the difference between standard rainforest background noise and logging activity like chainsaws, or detecting the presence of a specific endangered species.

Forschungsfall nachtigall (https://www.museumfuernaturkunde.berlin/en/science/nightingale-research-case-citizen-science-project-natural-and-cultural-history-nightingales) is a project from the Museum für Naturkunde in Berlin, identifying Nightingale songs from citizen science phone recordings.

Some differences between these systems and iNat:

Attention: Almost every iNat photo has been created with the attention of a human. A human has identified the species of interest, and taken a picture where the species of interest is typically centrally located, free of obscuration or occlusion, and in focus. Other potential species of interest are usually cropped out of the frame or not centrally located. The relative quality and control of cameras vs microphones on phones makes this particularly difficult to resolve in audio recording. Neither Rainforest Connection nor BirdVox have any sense of attention - they are listening all the time and must distinguish between significant background noise and the target sound(s). Forschungsfall nachtigall does incorporate attention - humans record and upload what they believe is Nightingale song.

Scope: Part of what makes iNat so awesome is that all species in the tree of life are candidates for observation and identification, and all identifications hang off the tree of life. All the hard work of sorting and grinding out the taxonomy pays off when an observation gets an identification that’s attached to a real species label instead of a generic tag like “tree” or “bug.” The vision model we’re training now knows about roughly 30,000 leaf taxa (mostly species), and because of how it is deployed it can make predictions about parent or inner nodes as well, which represent another 25,000 higher ranking taxa. I believe the birdvox “fine” model can classify a few dozen different species, and the other two projects can only identify one or two.

BirdNET (which @okbirdman posted) is an amazing project from eBird. It’s probably the closest analog to iNaturalist - it seems to be able to classify almost a thousand species of birds and is works in attention-based scenarios like their Android app. It’s powered by the MacCaulay library dataset which contains hundreds of thousands of labelled high quality bird recordings, and eBird has some of the best audio ML researchers in the world working on it.

Topic		Replies	Views
AI sound identification? General	10	2148	August 19, 2023
Audio samples in the "Compare" feature of ID'ing Feature Requests under-review	10	470	October 6, 2024
Recognize sounds automatically Feature Requests	13	10937	June 2, 2020
Audio and CV suggestions General audio-observations , computer-vision	11	224	May 13, 2024
Finding identifiers for bird calls General	27	1472	July 2, 2022

Suggest ID for sounds?

Related topics