Tagging Individual Photos to Exclude from CV Training Dataset

Platform(s): Web, mobile apps

Description of need:
It’s been brought up in various parts of the iNaturalist community that images which are not photographs specifically or primarily featuring the actual organism being observed may be having a negative impact on the CV training. This is a particularly acute concern for taxa with fewer observations or extremely similar lookalikes, where the margin for error or mis-training is narrower.

There’s contention in the community about whether these types of media should even be uploaded at all, even if they can help in the identification process, because of the damage they may do to the CV; and likewise, there is contention about whether identifiers should skip over observations that contain these sorts of images, because of the potential harm to the CV that could be caused by identifying observations whose images might salt the CV.

This concern encompasses images such as, but not limited to:

  • Spectrograms of recorded audio
  • Habitat shots
  • DNA test results
  • Chemical test results
  • Drawings & hand illustrations
  • AI art

Feature request details:
It would be very beneficial if iNaturalist users (both the uploader and identifying members of the community) could flag specific photos in an observation for exclusion from the CV training set. This could function like the DQA, through a voting system.

This would have to be on a per-photo basis, as many observations contain both clear photos of the organism and non-organismal photos such as spectrograms.

I’m no UI designer, but my first thought would be to add an option to the existing photo-specific UI to toggle your own vote on the flag of an image. The null state prior to having voted would visually match the “Yes, use for CV” option (here, I use an open eye):

Clicking it would first set your vote to “No, don’t use for CV” (for which I’m using a closed eye) and then toggle it back and forth.

For details on the current status of the image’s overall tally, and to remove your vote entirely, you could click on the photo info option that already exists, and see an additional option on the photo info page that functions the way DQA do. Here I’ve used the label “Is Photo of Organism?” to try to most directly get at the intended use of the flag, though a tooltip could be used to expand on it.

I thought that something like “Use for AI training” could be misconstrued as a rights statement, or as an option that relates to AI datasets scraped by third parties, or other AI questions not related to the purpose of training iNat’s own CV on photos of actual organisms.

One thing I haven’t addressed here is scat, tracks, constructions, etc. These are photographic evidence that is clearly related to the organism. Because we already tag these in annotations, I thought that it might be possible to note the observation annotations in the CV training and take that into account in ways that can’t really be done with non-photographic evidence; on the other hand, we can’t tag specific photos as representing scat/tracks/etc., just the whole observation. In an ideal world, I think we should be able to apply those annotations to the photos themselves rather than (or in addition to?) a whole observation.

I approved this, but it seems like the assumed goal does not align with what iNat’s goal for its CV model. The goal of iNat’s CV model is to recognize iNaturalist images of taxa, aka the evidence that people submit to iNaturalist.

The evidence that people submit are often not perfect, and aren’t always of the actual organism, and thus the model has to take this into account. If the model was only trained on close-up images of organisms, or of pinned moths or something, it probably won’t be as effective for the often faraway or blurry shots that real people take and submit, or of nests, webs, tracks, etc. Also, I’m not a spectrogram expert, but I do know that Merlin does not recognize sounds, it’s an image-based model that’s been trained on spectrograms, not pure audio. If people are submitting spectrograms to iNat, it’s possible that those images will help iNat recognize similar spectrograms as being of the same species.

IMO, rather than spend development and design time on something this specific, it would be better to work on functionality that allows people to annotate/categorize images, as well as draw borders around subjects or key markings that would a) be helpful for general usage and curiosity and b) might be used for more targeting training.

Is there evidence of this problem, and is it widespread? The model is certainly not perfect, but I suspect that has less to do with certain types of evidence being included and more that differentiating organisms is hard, and many of the photos people take don’t include enough diagnostic features for the model to identify organisms correctly. For example, I identify a lot of things stuck at arachnid. The model often gets things wrong, but I doubt it’s because of faulty training data. More that it’s just that lots of arachnids look similar, and that things like hemipteran molts can look a lot like arachnids, or that tent caterpillar nests look like spider webs.

7 Likes

I’m inclined to agree with such a feature request for several of the categories that @guerrichache enumerates, and others. Time after time, I am encountering small blurry moth images which are identified by iNat’s CV to the wrong genus or family when a nicely cropped version of the same exact image points CV in the right direction readily. So in this sense, CV may subsequently be making mistakes from its training sets by the inclusion of such misguided photos if they escape notice and inadvertently end up RG. (I have confidence that this isn’t happening with the small number of genera I monitor closely.) I would hope that CV is built to minimize false-positive identifications (putting an erroneous ID on an image), rather than false-negatives (failure to recognize a given taxon).

A related question: I’ve frequently wondered if this metric been examined: With each subsequent training session and photo set, is CV getting smarter and more discriminating? or as training sets grow larger, are CV outputs just getting broader by inclusion of a larger spectrum of good and bad images? With a larger image set including more and more lousy images, I am imagining that CV would get more generous or inclusive about suggesting a given ID rather than more discriminating. I would hope the latter aspect is the more important goal of each subsequent training session.

7 Likes

I think the goal of improving CV identifications is a good one, and maybe this type of functionality has a place in it. But I do think this type of implementation would be challenging and open to problems.

One broad problem is that this would introduce an inherently unquantifiable bias into the training set. As noted above, currently we can at least say that the model is trained to ID pictures taken by iNat users. However, with this option, the dataset would be censored by a non-rigorous/consistent process, which could lead to non-optimal outcomes. It could also be abused (people removing other people’s photos from CV usage without generating notifications) or just misused (people just removing pics for a variety of reasons which don’t correspond to instructions). For instance, someone might remove a photo from training because it is a larva or other different life stage, which is undesirable.

I think a better solution would be to wait for the chance to implement picture-level annotations (targets, life stages, etc.) which would solve many of the issues with photos and training and also add more value. If, after creating those functions there are still problems that this type of functionality could help with, it could be implemented then.

On a side note, an eye icon is already used for hiding, so that is out.

7 Likes

Side note, it’s unfortunate that photos of any kind show up first for posts that include audio. It makes it a pain to post an audio clip of a bird with a shot of where it’s perched, as identifiers will first see the photo (and if they’re not paying close attention will misidentify or not identify from the less apparent audio). It’d be nice if audio could be chosen as the first piece of media, as we can do with choosing the first photo of a post.

3 Likes

Please submit a separate Feature Request for this using the template provided - thanks!.

2 Likes

Spectrograms can be made to a very wide set of specifications. A 10-second logarithmic will look very different from a 2-second linear. Upper and lower limits can vary, there are different ways of rendering, different color schemes, different apps will frame them differently, and on. It’s not as simple as brightness, focus and distance. If the CV is going to be trained on spectrograms, it would be far better for it do like Merlin, use its own, generated from the audio file to one standard.

2 Likes

I like guerrichache’s suggestion, and I appreciate the thorough consideration they have given to the idea to come up with a proposal for solution. I would indeed love to be able to exclude media (say a spectrogram, or a more general shot of the environment) from being scanned by CV so that the observation could still provide a more comprehensive idea of the organism but without at the same time jeopardizing CV’s task.

Still, even while reading the post the first time, I realized potential pitfalls, including the issues pointed out by tiwane and cthawley. Nothing is ever as straightforward as it seems at first, is it? :slightly_frowning_face:

cthawley suggested waiting “for the chance to implement picture-level annotations (targets, life stages, etc.) which would solve many of the issues with photos and training and also add more value.” That sounds sensible.

For the interim period until that moment arrives, a work-around could perhaps be making it easier for users to place media that is NOT supposed to be fed into CV into the description area. It is already possible to link to external sites, also for images (such as spectrograms or animated gifs) or video clips hosted elsewhere.

The thing is, most users don’t realize that option exists because they don’t see buttons in a toolbar when editing the description field. Perhaps one could provide a default text that reminds them of the how-to: <a href="[url]">xxxx</a> and <img src="[url]">.

I realize that will not stop everybody from just uploading any media into the photo section. I just assume that knowing there IS an alternative would help reduce the number of non-CV-able media in the photo area.

1 Like

I like both of these suggestions, and I agree that this would more comprehensively solve the issue, in addition to adding more value in other areas as well. Would it be worthwhile/appropriate for me to rewrite the request to focus on photo annotation more broadly?

Is there evidence of this problem, and is it widespread?

I’m mostly posting this as a follow-up to conversations on the Discord and in the forums where I’ve seen this concern expressed, so I’m not personally familiar with the evidence for some of the taxa of concern. My understanding is that there little evidence either way (i.e. little evidence that it isn’t a problem, and little evidence that it is), so people are making their own judgment calls. I wonder if some of the more commonly-seen cases of CV confounding (like plants and their parasitic leafminers/galls) may be leading people to conclude the CV is relatively easily misled.

Perhaps there are experiments or datasets that could be shared with the community to promote a better understanding of the actual situation regarding CV training?

5 Likes

Not that these aren’t valid things to bring up. But the CV already has a bias from identifiers and what people photo. Identifiers choose to ID only certain things, normally what is IDable. Also every action on the site that users can commit can be misused or abused. Comments, flags, DQAs, life stage annotations, lists, journal posts, etc. I don’t see how you could add any function to the site that couldn’t be misused or abused. Even curator actions are not safe from misuse or abuse.

I think the ‘gold standard’ here would be able to tag where the observation is, perhaps with a frame similar to the ones that let you tag a friend’s face on Facebook. Perhaps with the ability to mark diagnostic or otherwise important features such as presence of flowers, seeds, diagnostic markings on a bird wing, etc. This could open a wide range of other features. For instance i sometimes take photos of a tree from far away for various reasons, but the frame may contain several species. Instead of copying the photo multiple times, imagine tagging each tree species on the photo allowing for it to split into different observations that way. You could also allow tagging of other people’s photos if they consent to that, to gather more data for observations too. Then the algorithm could know what the focus is and train on a more zoomed in cropped image while leaving the broader image for human identifiers. Within that it might also be a good idea to be able to designate certain photos as not useful for the AI, but either way allowing you to tag where the organism is would help the AI zero in.

Might be too difficult to program, but it’s something i’ve wanted on iNat for years and i think it would be really neat.

1 Like

I think this could be solved or at least improved by making more coarse suggestions, which is something I’ve wanted for a long time.

5 Likes

I would really like to see some way to tag habitat shots. I generally don’t upload habitat shots since I’m worried they would mislead the CV (at least for species without a large number of photographs).

3 Likes

If you store your habitat shots somewhere on a cloud, you could link them within the observation’s description.