The AI algorithm - confusing pathogen/host or subject/background

Recently, I added an observation for a ladybug, and the AI suggested Hesperomyces. That seemed fungus-related by the ending of the name, but I decided to accept it anyway and then research the name. Searching the name indeed brings up many ladybug observations here or on the web in general. Most of the images of Hesperomyces observations are not cropped to accentuate the fungus, so it’s no surprise the AI is getting confused.

So the question is, how will the AI properly learn the distinction in a situation like this? I think proper cropping would help.

I think something similar is happening with Psychodidae. I’ve been monitoring this group for some time. It’s been stated here that the AI knows Clogmia albipunctata quite well, but I wonder if it’s not learning instead the environment around the flies.

Perhaps there should be more emphasis on properly cropping photos. Or maybe a tool could be added to the website or app for users to indicate the item in the observation. I see some other sites doing this.


The AI definitely learns the background as well as the organism in some cases. I see it sometimes suggest anoles (lizards often perched on bark) when it is just the side of a tree.

Of course, as a lizard researcher, I’m occasionally fooled by “bark lizards” as well, so I can’t complain too much!

I think sometimes “background” identification will be helpful to the AI and sometimes not. But regardless, strategic cropping can help both the AI and human IDers. I think a basic cropping tool for upload would be good (also for workflow reasons).


I’ve found that the vast majority of Acer glabrum observations contain evidence of Maple Erineum Mite, and consequently most pictures of acer glabrum could be ID’d either way and the AI apparently learns they are interchangeable. As a result the AI suggests all Acer glabrum observations are Maple Erineum Mite, even if a particular one does not contain evidence of it.

Of course, in most cases a human IDer would have the same problem guessing which thing the observation was for unless the observer specified or added a coarse starting ID.


It’s reasonably common for gall observations to be created when by duplicating a plant observation, so in those cases the photos end up being literally the same for both taxa. Not that surprising that the CV learns to assign the both names to similar pics.


Computer vision is intended to work on actual photos from non-scientists.

But I often tweak the taxon photos - there I don’t want to see a nice picture of your dog, with an incidental plant off to the left somewhere. If the specialist describes field marks - I try to make sure that those are captured in the taxon photos, for next time.

CV is working with pixel colour and arrangement. It doesn’t ‘see a moth’ or see a ‘flower gone to seed’. The greige seeds on Ursinia are moth coloured. A strategic few seeds had fallen and CV and I could both ‘see’ a pair of moth wings. #ThisIsNotAMoth and needs the observer / identifier to step in.


I realize computer vision is intended to work on actual photos from non-scientists. I also think it’s reasonably to provide a cropping tool or a pointer tool for them to use to indicate their observation. I think that would greatly improve the CV results.

Here’s a simple suggestion that requires only a change in the UI: the software currently separates a suggestion of the CV is confident of the ID. My suggestion is to leave things as is in that situation, but if nothing can be identified confidently, simply add some text suggesting to the user to crop the photo so the subject mostly fills the photo, perhaps with a link on how to crop. For a more user friendly approach that requires a bit more work to develop (but still simple), add a crop tool that the user may use. The user does not have to use it. However, I’m guessing it will be used a lot because people generally want identifications.

1 Like

I think, last time we asked for cropping, the iNat response was - freely available in photo software. I do crop my photos before uploading - but others have a different workflow.

1 Like

It’s possible to crop in our Android app, but not our iOS app. You can see this annoucement for the future of the iNat mobile app. I can’t promise a cropping tool in it, but I’d be surprised if there wasn’t one - for photos taken within the app it especially makes a lot of sense. I think it’d be great to add a cropping tool to the web uploader, but as there are free cropping tools for desktop/laptop computers, it’s not a high priority.

Seek by iNaturalist does suggest cropping a photo if it can’t get a confident CV ID for it. I could see that eventually being added as some sort of onboarding in the future for the standard iNat app.

As to whether a cropping tool would help with either training the CV* model or getting accurate results from a submitted photo for taxa like this, I’m not sure.

*We try to use “computer vision” rather than AI because it’s more accurate - our model isn’t really artificial intelligence)


Out of curiosity, what size images is the CV trained/run on? Is it the full upload resolution or something less?


Although it may not be possible to crop in the iOS iNaturalist app, it is very simple to crop in Photos on iOS and then make the observation from that cropped version. That’s what I do when my observation comes from my iPhone.

I’m not the best person to ask, but I know we were using 300 pixel images for some time. @alex would be the person who knows for sure.


Images are resized & cropped to 299x299 before training.


iOS’s native photo editing program allows cropping and the addition of circles, arrows, and free-hand drawing to highlight photos of organisms ( --among other features. Nonetheless, since the iOS iNat app allows users to take photos (“Observe,” “Camera”), then we could see benefits to basic photo editing functionality to support that Camera feature.

The real way to do this is in iNat when making the obs. Otherwise the photos have to go through an extra processing step to crop any that have insects in them, which is double the work to upload. Whereas in the browser you can crop the first photo in-place after duplicating. For anything you were not meaning to grab but did.

1 Like

Who or what does this cropping? If it’s done by algorithm, is there some check to ensure the cropping is useful?

Our vision system trains and classifies on 299x299 pixel images, so all photos must be in that format before the vision system sees them. So we either have to crop, pad, or distort the photos when resizing before training/suggestion.

At training time, since the model will see the same images many times, once for each “epoch,” we perform a different random crop into the photo for each epoch. The intuition here is that it allows us to use the same photos many times, increasing model accuracy, while discouraging the model from “overfit” or “memorize” the training photo pixels. This step, along with random color shifting, results in a few percent improvement in overall accuracy.

At prediction/suggestion time, we do a center crop. It doesn’t help in all cases but it gives us another small overall bump in accuracy over alternatives like padding or distorting. I suppose the intuition here is that when humans take photos of really anything, they almost always centrally locate the subject. So a central crop very rarely loses important information, doesn’t distort the photographer’s perspective, and doesn’t predict on empty/padded pixels.


Would it be useful - if the observer offered a 300 by 300 crop of - what we are focused on this time. I crop for my obs anyway, but could deliberately choose that size. (I crop square because I like that format)

I wouldn’t make a crop just for the vision system, the downscaling happens automatically and seamlessly, and if you upload a higher resolution version of the photo, then the additional detail will be useful to humans. As for how to help the vision system understand what part of the photo to focus attention on, hopefully we’ll be able to address this at some point, perhaps with bounding boxes or attention based vision models.

1 Like

I wouldn’t know, because I crop my photos in my phone’s camera app before going anywhere near iNat.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.