I reported one person was uploading multiple 10 minutes audio files with no ID and wouldn’t respond to comments asking for time stamps or what identifiers were supposed to listen for. I also have to leave warning comments for those clips that are 30 seconds of silence followed by a deafening sound.
My favourite though is a couple that upload their sound files from Merlin and and their enthusiastic reactions to seeing the bird ID come up always make me smile.
I have seen several that specifically mention “IDed by Merlin” or they include screenshots of the Merlin output. Regardless, I don’t think it really changes anything… the recordings are long wherever they’re coming from, but it is clear that Merlin recordings are a common culprit.
FYI the Merlin recording platform is specifically tuned for the frequencies of bird calls, whereas the voice memo app is targeting much lower pitches. As a result, Merlin tends to produce higher quality bird recordings. Plus you get a spectrogram in real-time! (Edit: @tiwane corrected me here—Merlin is simply recording lossless/unfiltered audio, where as the default settings on other Voice Memo apps apply compression/may filter above normal speaking frequencies.)
One thing I had to learn when recording birds with a cellphone is to not move at all while recording. It seems the phone picks up any small footstep, rustle of clothing, etc. better than the bird song which might be at some distance. Can’t do much about wind noise however.
My understanding was that we’re not supposed to upload spectrogram due to messing up CV training or them not being a direct record of the organism? (since different programs can output spectrograms that look different in different colors etc)
I’m not sure this is particularly related to this thread, but I didn’t realize this was true. In some cases, seeing a spectrogram is crucial for identification.
It’s not true. Ken-ichi had a personal preference but that’s all it was. The CV just needs to learn to deal with it. Topic for a different thread though.
Audacity can help with background noise including wind noise too - I usually use the “noise reduction” and “amplify” tools on basically any bird sound I upload, it seems to quiet down the wind noise and amplify the bird to make it easier to distinguish. Cutting out any loud parts from, example, setting your phone down, is also good so an IDer doesn’t blast their eardrums. you can play around with it and sometimes get even moderate wind to be quiet enough to clearly hear the bird. It’s super handy!
I asked eBird about this and they said Merlin’s not tuned differently, it just uses lossless audio files for its recordings, so it’s saving the whole spectrum of sound. Whereas by default, the iPhone Voice Memos app uses lossy compression that may remove some sounds (the compression is designed for getting rid of sounds humans can’t hear). However, it’s possible to change iPhone Voice Memos to record using lossless files.
That makes sense. I’ve played around with this a bunch, comparing various iPhone apps. Even with compression turned off (and all background noise removal, etc. turned off) for the default Voice Memos iPhone app, something weird happens to quiet noises at frequencies above ~6.5 kHz. There is a noticeable improvement here when using the Merlin app (as well as other third-party recording apps).
I’m not familiar with Android, but on iOS, it’s very simple to export the WAV file from Merlin and upload to iNat.
Well, I don’t use Merlin to record bird calls. I do record using iNat. I have no idea how to edit the recordings, so I add them and then try to give timestamps when I get back home. I don’t upload 4-5 minutes of recording, but I will let the recording run longer if the call is intermittent as I assume a brief chirp may not be enough to ID the bird. It’s a little disheartening to record and upload to iNat. and then hear you may be annoying identifiers. My apologies if so, but honestly, I would prefer, if someone thinks a recording is wasting their time, that they just move on. I’m not going to be upset.
I think what you’re describing isn’t quite the issue at hand here. If you hit record because you hear something, and it calls within the first ~10 seconds of a recording, that’s great! Even if you let the recording run longer to capture other calls, if it’s identifiable in the first bit, I have no problems whatsoever with the longer recordings.
The problem (at least for me and it seems like a few others) is when a 3-minute recording is uploaded and the bird indicated in the observation calls once 2.5 mins in. Merlin facilitates this to some degree, because it gives a full list of the birds recorded over the full duration of the recording. Some observers then upload multiple copies of the same full-length recording for each of the species spit out by Merlin.
I see. I suppose the only answer to that would be iNat intervening, as you said, with some sort note asking observers to trim (but I suspect a lot of people might not know how to do that) or timestamp as I try to do. Is there an iNat tutorial on the best practices for doing recordings? Might help.
You can’t “share” it to iNat while you’re in the Merlin app, but you can easily import it from your Merlin folder while you’re in the iNat app!
It’s probably too big of a feature to implement for too few observations, so I don’t expect this to happen, but just throwing it out there: Could iNat add an audio editor that would let us trim recordings when we upload them? Like the photo editor, where we can crop images directly within iNat and don’t have to remember to do it before we select them. I’m not talking about anything fancy with filters, just an option to cut the beginning and end off. (Of course there’s still going to be people who don’t trim their recordings, but maybe not as many …)
I think this is a fantastic idea and will also shrink the size of the data recording for posterity’s sake for the benefit of iNaturalists servers. Is there any collaboration/API currently between iNaturalist and Merlin?
Since people are asking about workflows, this is what I do:
Record directly from Merlin on my phone.
If the audio needs trimming, export it from Merlin to the Super Sound app (on Android) and trim it there. There are probably better apps out there, but this one is free and works well enough.
Export from Super Sound to the phone’s file system and then import to iNat.
Best practices for audio in my view:
Should not be shorter than ~ 10s unless the vocalisation is very distinctive. For birds that sing very varied songs (e.g. Marsh Warbler or Melodious Warbler in Europe, not sure about the equivalents elsewhere), it can be helpful to have longer recordings of 30s or more or else several shorter recordings.
Ideally there should be 2-4s of background noise before the focal animal starts vocalising. This helps to accustom the ear to the background noise.
The animal should vocalise within the first 10s and within the last 10s of the recording. Otherwise, trim.
If the recording is longer than 1min, the animal should be heard regularly with no long gaps. Otherwise, split into smaller chunks.
Normalise volume so that the animal is clearly audible. (I actually don’t do this and would appreciate any advice about simple workflows on mobile.)
This might be controversial, but: Don’t clean audio to make the focal animal easier to hear. Processing almost always distorts the sound of the animal. I can often hear this distortion when users process the audio heavily, sometimes to the point where the vocalisations sound off and it reduces my confidence in the ID.
Uploading spectrograms can be helpful. For some groups like bats it’s basically required to ID, as many species vocalise outside of humans’ audible range. I can’t see how this would be a problem for the CV model - the spectrograms look so different to photos that I can hardly imagine it mixing them up. Actually, I’m not sure why iNaturalist doesn’t automatically convert all audio to spectrograms, do some very basic processing on them, and then train the CV model on them. My understanding is that this is how most of the audio ID models work.
Tips for better audio:
Try to get close to the animal (without disturbing it).
Don’t move while recording and especially don’t walk.
Point the microphone towards the animal (for most phones the microphone is at the bottom.
If it’s windy you can try to stand with your back to the wind and hold the phone close to your body.
You can buy shielded directional microphones pretty cheap if you want better audio quality (but I just use my phone). The pros use more sophisticated set-ups with parabolas etc.
Some other points that frequently come up and I would be interested in others’ views:
Many users upload screenshots from Merlin or BirdNet with the ‘species identified’ view. In my view, this should be discouraged, as the image does not relate directly to the organism. I occasionally comment on this when IDing, but some users seem committed to doing this and opposing it seems like a bit of a lost cause without some kind of community consensus.
Easy workflows for volume normalisation on mobile (Android / iPhone)?
Should iNat display a spectrogram automatically? Allow trimming of audio?
I think the concern has to do with the uneven distribution of which species will have lots of spectrograms in the photo training set and which wont.
To illustrate using a known CV problem in my taxa of interest: lots of moths require genitalia dissection to identify. The most frequently-uploaded moth dissections on iNat are for a handful of very common species with very distinct genitalia and completely identical external appearance. The CV recognizes that “genitalia dissection image” is a commonly uploaded type of photo for those particular species. The problem is that now the CV recommends those few species for every genitalia dissection image, even ones that look absolutely nothing like those species. If we uploaded a hundred dissections of every moth species, I’m sure the CV could learn to differentiate between them. But the bias toward only uploading that type of photo for certain species means the CV has learned to associate that type of image with those couple taxa.
I’m sure a similar issue would happen if we started uploading more sonograms. Could the CV be trained to identify sonograms of different birds? Probably. But let’s be honest, the sonograms uploaded will be heavily biased toward certain species, and the CV will eventually just associate “sonogram image” with “that taxon that’s always getting sonograms uploaded”, just like it associates “genitalia dissection image” with “that taxon that everyone keeps dissecting”. I doubt anyone is worried that the CV will think a sonogram is a blurry picture of a bird, more that the CV is being trained on 99.99% organism photos and its sonogram-recognition will be pretty bad.
Ok, I can see how that would lead to issues in some cases. But for birds in particular, I’d be surprised if the issues were that dire. People upload heaps of audio of a wide range of bird species. It’s uncommon for there to be two closely related species where one gets audio uploads and the other doesn’t, except in the case where one is rare (and then it’s not audio per se that’s the problem). An automatic spectrogram from iNat’s side would generate a large volume of training material that would make this largely a non-issue for birds in my view.
It might be helpful to point them to the help section that says
„Media used in your iNaturalist observations should represent your own experiences, not just examples of something similar to what you saw. Please do not upload photos you found elsewhere, such as online or in a book, since they don’t represent your own experiences and are probably a violation of copyright law"