Create computer vision model saliency maps

Platform(s), such as mobile, website, API, other: N/A

URLs (aka web addresses) of any pages, if relevant: N/A

Description of need:
Allow differentiating the CV prediction model to create a saliency map of the input image.

Feature request details:
A saliency map is a technique to ask a CV model to tell you which pixels it is “looking at” to make its prediction. You can produce one by differentiating the output prediction with respect to the input pixels. In iNaturalist’s case, I think this could be used for some really cool applications:

  • By differentiating with respect to a given target species probability, label different species in an image.
  • Identify images containing multiple species, and recommend splitting them.
  • Automatically cropping an image to subject to reduce data size and compression.
  • Indicate what “field markers” the model is using to differentiate a give pair of species.

I would like to build this myself, but unfortunately the model that iNaturalist publishes is severly restricted, and includes only a .tffile designed for the forward pass inference, not constructing a saliency map. I would like to ask that either:

  • iNaturalist build features in this space
  • iNaturalist publish more of its models to open source to allow others to
    Ideally both!

Here is an example of a saliency map, from the above link:

Correct me if I’m wrong, but I believe the current CV model looks at the entire image not specific portions, so it can’t recognise individual organisms within a picture. So unfortunately this would not be possible without redesigning the CV model itself. Nonetheless, this is an interesting idea.

It does look at the entire image, but different pixels have varying importance in the final prediction. The importance of different pixels depends on the content.

What I mean is the habitat and background are often salient as well as the organism itself, so I’m not sure if this would be reliable for only indicating where the organism is. Would need testing I suppose.

I don’t think the request is to “indicate where the organism is” but to show which parts of the image are being prioritized in the analysis. I’ve definitely seen flowers that are commonly photographed with a distinctive butterfly on them have that butterfly species suggested even for flower photos that have no butterfly in them!

i think object detection is different from what you’re describing, and your saliency map doesn’t necessarily lead in the same direction as object detection.

if you want to identify “field markers” for a given taxon, i think something like an Activation Atlas would be more enlightening, since you could present something like that for the taxon rather than simply highlighting salient features per photo.

We briefly explored something like this years ago, and it looked pretty cool. For example on a photo of Vanessa cardui it showed you the exact part or the wing that it was cluing in on.

I think it was considered a heavy lift, maybe infrastructurally, at the time. Also there would need to be a lot of thought in design and messaging, to let people know that just because the model is looking at something specific, it may not necessarily be correct. But I do like the idea of making the model a bit less of a black box.

Maybe the technology for doing this is a lot better now, things are changing so quickly.

Thanks everyone for chiming in! You are totally right that it may activate on things other than the subject, so it’s not really subject ID. But even seeing that happening could be really interesting and informative!

Thanks for the background and all of your hard work @tiwane . I know you have your hands full. Do you think it would be possible to make a bit more of the model open source so other folks could take a shot at this?

I think that it is a pretty good idea. In my old birding books there was lines pointing to the discriminant anatomical features. Such saliency maps would be something similar, but automated.

That’s a bit beyond my pay grade, I’m sorry.

I got something extremely primitive working using the small 500-species model which is published and it can do at least basic subject detection.

the object detection isn’t great. it’s showing a single square around both bears rather than individual rectangles around each bear. maybe this might be interesting for a computer vision researcher or developer, but i’m not sure how the saliency highlights really help a regular iNaturalist user. i see highlights around mostly around the face on the lower bear and also long the back and thighs for the upper bear. is that enough to tell you what to look for when trying to differentiate between different species of bear?