Ive started doing audio observations of birds and have some questions. First is it okay to add pictures of the subject? I worry that someone might see the photo and make an ID without listening to the sound file. Would it be alright to add a picture of the sound like a Spectrogram? It would make it easier for me to point out what sound im focusing on.
I think usually it’s totally fine to add pictures alongside the recordings, as long as you’re certain that they show the same individual. I do sometimes see spectrograms uploaded also but they are a bit less accessible for ID, but it doesn’t hurt to add more context.
The observations with photo and audio thumbnail shows a speaker icon in the top right corner so identifiers will see that.
Remember to leave a note as to which sound you want id’ed if there is a doubt.
Join the project “Audio Observations From Around the World”.
Several days ago I found an observation with both picture and sound, which I think was already identified as a coquĂ. I’ve read that there are two species with similar sound, but I’ve heard only the common coquĂ in the wild, having been in San Juan and the nearby coast. The other species lives at higher elevations. The picture I compared with both species of coquĂ and decided it’s a common coquĂ.
Where I live, there’s a chuck-will’s-widow that’s not here yet, which I’d like to observe. There are also some tree frogs (in this climate, they couldn’t be coquĂes), for which I think I’d need a directional microphone. What equipment do you suggest?
It’s totally fine to add a picture of the subject, but I would guess that many users might just look at the pic, especially if it is “first”.
Staff have asked users not to upload spectrograms for a variety of reasons, here’s one response from them:
I’ve posted a few birds and a couple of frogs (including a coqui) where I was able to record their song with my cellphone and get photos of the calling individual. Not always easily done unless the animal is calling regularly and gives you time to juggle a camera and a phone to record sound. Sometimes I wish I had better audio equipment but then that would be one more thing to have to carry.
Important context to that quote:
Something that is not clear in the question or the replies: It’s perfectly ok to add a picture of the bird that you saw singing (making sure it was that bird making the sound, of course), but do not add a “generic” picture of the bird species to illustrate the sound.
Both audio and photographs should be yours and should be of that particular instance (Meaning: If you saw one recognisable bird one day and took a picture, and saw the same bird a different day and recorded it, you need to make two observations, and then on the description or fields you can indicate it’s the same bird)
I find spectrograms super useful for identifying bird sounds. In some cases, I had to download the recording and import it to Audacity on my computer to see the spectrogram and be able to compare it to my recordings.
I find it similar to when I have to copy a photograph and edit the lighting and contrast to be able to see diagnostic features, so, in both cases, I’m grateful when the observer does that themselves.
We make that assumption when showing observation photos on the taxon page, when training our computer vision system, when sharing data with partners like GBIF, etc., and all those non-organism shots break that assumption and cause us to use and share inaccurate information
The quote @cthawley added surprises me and, even coming from a staff member, makes little sense to me:
On one hand I have seen field drawings, spectrograms, landscapes, etc used to support or add information that would make a specific ID impossible which, ok, you can be against as a platform and the impact wouldn’t be massive, but on the other hand if it were an official position (Thank you @dhasdf for the context) it would throw into the trash every single observation of scat, tracks, molts, nests, webs, etc.
Those are evidence of an organism but not pictures of it, and could also muddle the dataset for computer vision and other reasons. (They do, to some degree. The “unknown/lagomorph” project is full of pictures of somewhat round poops)
A supporting piece of information that helps confirm a species is part of good documentation, in my opinion, as long as it’s not the only piece of information.
Field drawings/sketches are allowed on iNat, though they should be made in the field and not from memory.
This is not correct - iNat explicitly allows/encourages photos of all of the types of evidence given above as evidenced by the documentation
“An observation records an encounter with an individual organism, or recent evidence of an organism, at a particular time and location. This includes encounters with signs of organisms like tracks, nests, scat, or things that just died.”
The issue with spectrograms is that they are not photographs, which iNat operates under the assumptions that graphics files are. Photographs are representations of visual (optic) experience/information. Spectrograms are visual representations of auditory (sonic) experience/information which is something fundamentally different.
Habitat photos are also discouraged by staff. You can see previous threads on the forum addressing this and ways to appropriately include habitat information (eg, a habitat photo which includes the organism):
https://forum.inaturalist.org/t/policy-recommendation-about-habitat-photos/10866/5
https://forum.inaturalist.org/t/is-adding-photos-of-the-nearby-habitat-inside-the-observation-of-an-organism-ok-if-the-habitat-photo-doesnt-showcase-the-organism-at-all/59680/22
https://forum.inaturalist.org/t/habitat-shots-acceptable-with-audio-observations/26630
Both spectrograms and habitat photos are officially “not encouraged” in iNat’s documentation:
https://help.inaturalist.org/en/support/solutions/articles/151000171680-what-do-i-do-if-the-observation-has-multiple-photos-depicting-different-species
If you upload just audio as evidence and you provide the initial ID to species, then identifiers will know what you’re trying to ID. No photo needed. If you’re unsure the ID, identifiers of audio enjoy if you indicate what you want identified in the audio, such as “the noise from 6 seconds to 13 seconds,” or “the singing bird in the background,” etc. If you know you have a photo of the organism making the sound, the photo may help the ID, though we agree with other comments that users may not listen to your audio at all; or at least they may leave no indication that they listened to it. Wish there was a “counter” to indicate how many people listen to it.
If you get good photos of the calling animal and an audio, it probably doesn’t matter if the ID is made just from the photos. I like to get both if I find a singing bird that is cooperative, mainly so I have a record of its song for my own use.
I download every single audio observation I intend to identify to look at the spectrogram. I deleted all my spectrograms, I hope. But in tons of cases, I would be absolutely lost without them.
I keep downloading them and try to explain what “I see” in the audio… I think they are so helpful for identification purposes, but I get the CV point and therefore I just dont and keep downloading and deleting.
But you can totally mix photo and audio. It helps a lot. (from IDer perspective)
I had hoped spectrograms would be allowed. Its not something that could just be added with a notation like the weather. We can add pictures of a bird, the audio sounds it makes, but not actually show the sound :( im not an expert and im new to the audio though. I am glad i am not the only one trying to use them lol. I appreciate everyone taking the time to respond too!
I’m ambivalent about adding the capability of uploading spectrograms to iNat observatiuons. I absolutely believe in their value to help in IDing birds and understanding the sounds they make; I’ve been using and studying them for many decades for a wide variety of bird species. But the iNat structure would need to be altered in some way to exclude such uploads from the training sets for Computer Vision. Keep in mind basically how CV works: It has no idea what a bird is, or what a spectrogram is. It just analyzes all the digits of raw data in images (think: a black-and-white image of a spectrogram) and tries to associate it with some species it either has already learned or a taxon that someone tells it that the image represents.
It is self-evident that the spectrogram of any bird–that squiggly picture–looks nothing like what CV is “seeing” and learning in an image of the actual feathered bird species. IF CV is training on a bunch of images of a colorful bird like a Painted Bunting or a dull bird like a female Indigo Bunting, it can eventually figure that out. But throw in some images with some black or gray squiggles on a white background and it will just be too “foreign” to its acquired knowledge of the actual feathered creatures.
Could a Large Language Model learn to recognize the spectrograms for what they are and identify them from that context? Absolutely! (And Merlin already does this with the sounds themselves.) But that is simply an entirely different data set on which iNat’s CV would have to train.
I’m pretty sure Merlin is computervision, it’s been trained on spectrograms and not actual audio files. I remember years ago when we discussed training a model for audio, the inclusion of time in audio makes things much more complicated.
Tony, I’m shocked (and slightly embarrassed) to hear this! I thought Merlin just had good “ears”! I think I’ll sue Cornell University for false advertising: The app typically says, “Hearing a bird…” As a Brad Paisley song says, maybe they’ll just “pay me to go away”! ;-)
p.s. Just in case anyone is worried about my intentions, I am, just as for iNaturalist, a proud monthly supporter of the Cornell Laboratory of Ornithology.
Here’s a deeper dive of which I only understand a small fraction, and a talk by Grant Van Horn. Grant also helped us develop our computer vision model.
The situation with spectrograms is one of the use cases that’s been brought up before for allowing images to be “tagged” with information used by the CV (so they can be be excluded or used in a parallel training set). They seem to be a very valuable form of information for identifiers; perhaps one day we’ll have a feature to reconcile the CV concerns with the value added for identifiers.