No, it does not.
Many things have to happen first before this is possible. Audio recognition often uses similar models as image recognition, where the computer is trained on spectrogram images of the audio rather than “hearing” the audio. So the first step would be to have iNat generate a standardized spectrogram (proposed here; discussion of logistical hurtles can be read about in that thread). Then some form of moving-window recognizer has to be made because it doesn’t really work to have the computer look at the entire audio clip (it needs to take it segment by overlapping segment).