There are a number of us in Central Texas using game/trail cameras to document wildlife remotely and uploading the results to iNaturalist. I tend to crop my trail camera images to focus on the organism, but at least one project in the region is encouraging contributors to upload the entire uncropped images from the cameras to iNaturalist, presumably to preserve the date/time stamp and habitat context of the records. For small birds (warblers, thrushes, etc.) that are recorded in the latter images, it becomes a “Finding Waldo” effort when trying to review the images and confirm an ID. It got me thinking about the advantages/disadvantages of such full-image trail camera observations in CV training.
- Is there some pre-training filtering of Research Grade observations that would tend to reject large habitat images with a tiny/obscure/cryptic subject such as a bird in a trail camera picture?
- If observations such as in (1) are included in a training set, would they tend to obfuscate how CV is “learning” how to ID a given taxon? For instance, in learning how to identify “Bird Species A” on which CV is trained on a set including a lot of uncropped trail camera images, will CV tend to “learn” that a pile of brush or mass of foliage = Bird Species A?
The human eye can search through a complex habitat image to focus on the animal subject, but I don’t think that is how CV analyzes imagery.
As I understand it:
- There is no rejection of large habitat images.
- Yes, the CV can learn to associate habitat with a species. It already does this in many cases. I see lots of photos of vegetation IDed by CV as anoles, and, they do indeed look like spots that an anole would love to hang out but…there’s not one there.
So yes, the CV does not search through images to “find” the organism. However, including uncropped pics isn’t against the rules, and habitat information could be useful. I don’t think keeping the date/time stamp in the photo is a particularly good reason not to crop the photo myself (should be in the EXIF and will be in the iNat observation). I also think cropping is generally good practice as it definitely helps human IDers. Someone could always upload a full version of the photo and a cropped version (I don’t think this is bad practice, but if there’s guidance that suggests otherwise, happy to learn).
If I interpret your question correctly, I have to wonder why one would spend any energy whatsoever thinking about “what kind of pictures would benefit the AI.” Isn’t the AI supposed to be a tool? If some images are challenging for it, then it needs to (be) improve(d). Why would any observer take pictures differently in order to make the AI’s task easier?
I’d probably post a cropped version as the primary image and the uncropped (original with date/time stamp) as second photo.
I agree that we need not tailor our photos just for the CV’s comprehension. I like to edit and crop my pics for the human user and for my own sense of what an aesthetic picture looks like. Which probably does make the CV’s job easier if the subject is unambiguous.
@schoenitz Your point is well taken. The use of AI for an endeavor like species identification on iNaturalist is a two-way street. I note that CV suggestions for identifications are used very frequently probably by a majority of iNatters, especially the large population of newer and/or casual users. And at present, CVs output is frequently flawed. I don’t know enough about the AI behind iNat’s Computer Vision to speak intelligently about when/how/why those identification failures occur, but one source of limitation on its learning must undoubtedly be the nature of the input images. The iNat community of users can affect that input, both in quantity and quality.
To oversimplify, this goes back to the old computer adage, “Garbage in, garbage out.” If every image input for a given species was a perfect close-up portrait, I trust that AI will pretty quickly be able to discriminate most species on which it trains. To the extent that the set of training images includes poorer representations of an organism, the AI will necessarily have a more difficult time in learning how to recognize a given species. Limited detail from low resolution images, or complex habitat images in which the subject animal/plant is a very small portion of the pixels will be challenging for the AI learning process. Trail camera images with small subjects probably represent some of the most difficult inputs for AI learning.
Now imagine an iNaturalist project which is uploading dozens or hundreds of uncropped images from the same game camera, then diligent identifiers manage to sort through that set and raise all of them to Research Grade for the array of species included. For RG images of coyotes, skunks, and armadillos, iNat’s CV will generally have no trouble training and learning from such a set. But for small birds or mammals–which are not infrequent subjects in the trail camera set–we’re asking CV to train on imagery that contains very little in the way of organism-specific detail, i.e. few to no readily visible “field marks” that we humans can discern.
So the output of a CV suggestion which had been trained on a heterogeneous set ranging from perfect portraits to trail camera jumbles may give spurious results. It may “learn” that when it sees a jumble of twigs and foliage in a certain pattern (with a tiny embedded organism somewhere), that it must be a Song Sparrow because that’s what iNat’s identifiers keep naming it. CV doesn’t know Song Sparrow field marks, it just knows that a bunch of jumbled habitat images were placed in a Song Sparrow bucket by human identifiers. CV probably knows how to ID a good image of a Song Sparrow, but it’s also going to output “Song Sparrow” as a suggestion for similar-looking jumbles of twigs and foliage.
All of this is a long way around to saying that the input to AI training sets for iNaturalist’s Computer Vision are an important part of the process and we should be mindful of that in the nature of our inputs. For trail camera images, suggestions such as those by @jnstuart seem quite warranted.
I do suggest to the folks that help me with my game camera project that if they wish to crop close to the subject to please include the original along with it. So much information is in that full habitat and info bar image. If you are managing a project knowing which camera along with date and time are as important as the habitat context itself. I certainly do not claim to know what any researcher would want to see from an observation, I just don’t feel we should exclude information completely.
I tend to strive for a more close up game camera images preferring the quality dynamic over the more generic larger scene. The smaller animals that result are usually better identified. The missed animals are either missed or picked up by a support camera observing the scene allowing one to redirect if needed.
In contrast, when I document a nice insect I most certainly crop it close and thus remove the surrounding information.
I know this topic is about the CV and smaller off center observations. I believe the CV is far better than it once was in regard to those observations. When I talk to Master Naturalist training classes about iNaturalist I use examples of Fox squirrels from my game camera project. The CV does well on center or even off center. Add a large acorn in it’s mouth off center and it still gets the top suggestion correct but the subsequent suggestions are way off. I like to tell them that there are limits to the CV ability but stressing the importance of centered and focused images it can really help them in their early iNat adventures.
I think when the CV is not certain the resulting “we’re not certain, but” is good advice for leaving the ID coarse.
From my perspective the CV does a very good job making suggestions on larger quantities of observations. Not just mammals either. It can pick out specific animals off center it does take a little longer to do so.
As a mere human, I prefer that folks make the first image cropped with critter near the center and then have whatever additional photos as extra.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.