Screenshot from a bird voice identification app as a photo

I just found two fresh observations of birds done by an experienced user (10000 obs.) which consisted solely of a screenshot that shows a piture of a spectrogram of a bird song recorded by an app and an AI identification proposed by that app. I asked the observer to upload the sound itself. Are suchobservations acceptable? What i the right course of action?

1 Like

This thread has relevant info: Spectrograph allowed?

Given the following statement from iNat staff, I think it would be ok to mark no for “evidence of organism” in the DQA. Though, politely requesting that the audio itself be added as you have done is an important first step. It’s possible they are using the spectrogram screenshot from an app as a way of marking the location in the iNat app and they plan on uploading the actual audio when they have access to a computer.


My sense is that spectrograms can be very helpful in bird identification. So often I look at the recording bar and wish I could just change the display to a spectogram. I think they are the same quality of information than the recording, or even better for those who are able to identify birds by them (I am not that good, for me it is ancillary, and helps me compare it to other recordings). So from the level of evidence they should certainly be allowed. However, from the level of practicality it means they show up as photos, and many people who identify photos do not want to identify sound evidence, and my guess is that even of those who are into bird song ID only few can read spectrograms without any context. So it makes sense to ask for the audio recording, and encourage the upload of spectrograms somewhere else, if that is possible (notes). Maybe in the future iNat can implement a display of audio recordings as spectrograms :-)


This has already been discussed at length in other topics (some relevant discussion in Audio recordings of bird song/calls). To sum up one of the major points: spectrograms are only a useable quality of information if a standardized scale is used. People may become accustomed to identifying spectrograms at one scale, but if someone posts a spectrogram using a scale other than the one they are used to, it becomes much harder.

There is an existing feature request (Automatically add a spectrogram view to observations with sounds) that if implemented, would produce standardized spectrograms for audio when uploaded.


Was the evidence of presence: organism annotation available when that was posted? I think it mostly handles the relevant concern.

Anyway I don’t think there is specifically anything that needs to be done about the observation, a spectrogram is evidence.

It was not at the time of that specific post but it has been for several similar, more-recent discussions and the stance has not changed. I’m not sure I understand exactly how it’s relevant. There is no “evidence of presence: spectrogram” annotation and a “evidence of presence: organism” annotation ties directly into the concerns of the original post. Additionally, two other concerns mentioned in the post are not addressed by “evidence of presence” in any way:

In regards to these two, refer to my previous comment about a lack of standardization in posted spectrograms. Also, a relevant comment from that linked thread:

Yes but even there further down it was clarified that it was explicitly meant as a personal preference and not a staff-endorsed rule. For a bat a human-hearing-range audio recording would be unusable as evidence but a spectrogram would not be. Standardization is a problem in terms of quality of course but theres no standardization of camera setting or anything either.

As for the computer vision… I guess once enough spectrograms are up the computer vision will learn to read them if the species can be ID’d from them?


Yes, I don’t disagree with any of this. I do hope eventually iNat will implement the automatic spectrogram (I think they haven’t yet due to logistic hurtles, not because they don’t want to).

To an extent this is true too, but not exactly. In an image the proportions of the object to ID stay the same no mater how you photographic it. This is not true of changing a spectrogram scale because the x-axis need not necessarily change with the y-axis. I talked about that in more depth in the linked thread. However, you are correct, many other aspects of photos are not standardized.

This is unlikely without standardized spectrogram scales (not to mention many other display features such as color scale, etc.). I build automated audio recognizers for my work, and the first step is always to standardize the spectrogram (the same is done for other existing bird song recognizers; e.g., Merlin).


Totally agree, even spectrogram specific CV models will not work well (or at all) without standardization. A CV or any other “AI” model is not a magic box that will automatically arrive at the correct answer. Throwing a fundamentally different image file (spectrogram) which depicts sounds as a graph into a model that is largely trained on visual data and structured for that is likely to lead to poor predictions. Garbage in - garbage out.

One would need to create a separate model specifically for standardized spectrogram data for quality predictions.

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.