Article on AI generated image(s) on iNat

AI-generated nature photos are becoming a real challenge for citizen science, and this article introduced an iNat observation as an example so I thought I’d share it here.

While it’s discouraging to see AI used to spoof observations, we’ve dealt with data integrity issues like false locations before. I believe the high percentage of dedicated, honest users on platforms like iNaturalist and eBird will be what saves us from being ruined by AI fabrications…

https://www.researchgate.net/publication/403024687_Synthetic_Species_The_Emerging_Threat_of_AI-Generated_Biodiversity

25 Likes

The fact that AI will likely be able to completely fabricate high quality macro images in a couple years has been really mentally damaging for me. I really don’t want to lose my motivation but it is at risk…

23 Likes

Photographs have always been a lower grade data point in comparison to a herbarium specimen. And even a herbarium specimen can contain fake data fields, either date or location.

The overall activity record of the observer, and also their activity offline, will be important to assess the level of trust that can be given to their observations and the trust that can be given to their metadata.

18 Likes

I have ZERO, but ZERO idea, but do photographs from celphones and cameras have an internal metadate that could rule out any other IA fabricated imagen? this could be usesful for inaturalist to, at least, determinate what is real and which is not.

7 Likes

This would require some sort of trust score to be assigned to users. I wholeheartedly agree that such a thing is a good idea and will become more necessary as gen ai nature images proliferate, but inat has come out in opposition to such a thing more than once in the past.

6 Likes

Every researcher needs to decide whom they trust. Some universal trust score cannot be universal enough.

8 Likes

photos do contain quite a lot of metadata (EXIF data covering just about any photographic settings and details about the camera among other things). but I’m not convinced that this could not be fabricated. I’m of the opinion that gen ai images and videos should contain digital watermarks that cannot be removed. for reasons a whole lot more important than nature observations and taxonomy.

15 Likes

They have the EXIF metadata, but those can be easily faked.

4 Likes

Public services for image generation usually do contain such marks, but anyone can train their own model. There are also other types of photo manipulation, some of them quite traditional, that canbe used to fake features shown in a photograph.

3 Likes

Put in place a rule that users posting AI-generated photos will be banned.

I feel confident that as of now, they can be spotted by expert identifiers.

9 Likes

It’s already potentially grounds for suspension, just like posting a lot of stolen photos is: https://www.inaturalist.org/pages/community+guidelines

  • Add accurate content and take community feedback into account. Any account that adds content we believe decreases the accuracy of iNaturalist data may be suspended, particularly if that account behaves like a machine, e.g. adds a lot of content very quickly and does not respond to comments and messages.

And there are already ways to deal with artificially generated content.

23 Likes

For me, it’s not even AI images that are scaring me, but what if someone is using AI to run an Inat account? For example, a new power identifier joins but it’s just a AI bot using Inat.

8 Likes

I agree that these images are concerning, but it’s also worth remembering that incidents of fraud have poisoned science for centuries. See, for example the book A Rum Affair about a botanist who planted rare plants on a Scottish island in the 1940s to provide support for his theories, or the Hastings Rarities, in which bird specimens were fraudulently presented as having been collected in the UK in late 1800s and early 1900s – many of them which would have been first records for the country if true. In both cases, the fraud took decades to prove. Generative AI is a new tool for this old tendency to fake records, but it’s now also easier to identify fakes. In the case of any records that would provoke some management response, it’s a reminder to do some due diligence and corroborate reports before acting. Fraudsters have little to gain and their reputation to lose.

25 Likes

Usually this is very easy to spot and they will get flagged and suspended for doing so. iNats CV is already about as good as it gets for identification of organisms from images so using an outside AI that hasn’t been purpose-trained for that task will just looks like a user spamming a bunch of junk IDs. I think the risk of this type of thing poisoning iNats data is very minimal and would be swiftly corrected.

9 Likes

usually isn’t really good enough, now is it? honestly, EXIF data should include edit histories for any photo manipulations. do I think such a thing will ever happen? not likely during my lifetime I expect.

I would like to see inat prohibit outside image manipulations and require all manipulations to be done within its own editor. something like this could then allow inat to display which manipulations were used on whichever image was in question.

and also offer a tab to view EXIF data. give inat users as much information as possible to evaluate possible AI fakes. because they’re only going to get harder to identify.

1 Like

this is the idea behind C2PA / Content Credentials. it’s effectively a modern version of metadata that are encrypted in a way that makes it evident when they have been tampered with. right now, iNaturalist’s upload process removes metadata, including C2PA, and further does not record C2PA information in the photo record like it does with EXIF. the person with the original photo will still have a way to prove its authenticity though. that proof just would have to be obtained outside of iNatuarlist.

I believe the latest generation of Pixel and flagship Samsung phones will capture C2PA metadata, as will programs like Photoshop. The major AI image generators also add C2PA metadata to their images that will indicate they were created by AI. but I don’t think the standard has made it into many physical cameras yet.

more discussion: https://forum.inaturalist.org/t/c2pa-content-credentials-and-media-assets-in-inaturalist/63341.

13 Likes

How could this possibly work? All you’d theoretically have to do to get around that is take a screenshot of an AI image and put it in another image editor to save it as a different file with no metadata. Some kind of hidden watermark, maybe? Something in the pixels that only computers can read, somehow?

5 Likes

the idea is that when an image contains the (C2PA) metadata, it can provide proof of the chain of custody and changes. as you noted, it’s possible to strip that metadata away entirely (as iNaturalist does), but then images without such metadata will just have less proof, which is no better or worse than the current state. eventually, the hope is that at some point in the future, most new images will get C2PA metadata, and then it’ll be possible to solve many problems, including folks trying to pass off AI work as non-AI work.

7 Likes

This does not mean they are wrong, but the article referenced here seems to be a self-published opinion piece pretending to be a research article. There is no study behind, the author saw an AI-generated observation and wrote about it. The journal behind it: “Soothsayer, Journal of Mantodea Research”, doesn’t seem to exist.

I understand AI generated images could in theory be a problem for iNaturalist, but the article provides no evidence for it. People could already upload observations with fake locations and it’s not too common, as there is similarly little incentive to do it.

27 Likes

The use of AI to generate large volumes of non-obvious fake content is something that troubles me more broadly, but I’ve found it helpful to try to put specific impacts in perspective and try to think realistically about the most likely magnitude of the problem before I get too gloomy. I think iNaturalist observations are actually harder to fake than most other things because of how many details come together in them. It’s not enough to create a realistic-looking image of an organism, all of the details need to be correct to fool expert identifiers. The time of year needs to be right (if I see a morel observation in October and it’s not one of the two species that fruits at that time of year, I will be investigating further), the location needs to be right to match both the organism, the habitat, and any other details that show up in the background. And the pattern of observations need to be right (no implausible leaps or contradictions between observations taken around the same times). If I see suspicious irregularities in an observation, I will go look at other observations from the account. It’s the exact same process used to find accounts that have widespread issues with location/date accuracy or with stolen images. It should also work for AI observations.

As a general rule, most malevolent users of AI who are using it at meaningful scale will be using it in a way that takes the path of least resistance to achieve some ulterior motive. That often means either spam/scams or some sort of incentivized misinformation. To the extent that any of these activities would be possible on iNaturalist, I think that they would be far easier to achieve in other corners of the internet that have broader reach and less scrutiny on details. I am far more concerned about AI bots pretending to be people on forums to hawk products, run scams, leave fake reviews, and manipulate opinions. That’s all far easier to do than faking observations here, and has far more obvious incentives. That doesn’t mean AI on iNat won’t be a problem, but it constrains the likely scale of the problem, and means that we are more likely to see sporadically fake observations from unsophisticated duress users than larger better organized efforts that could be more damaging.

Also, extraordinary claims require extraordinary evidence. An observation that appears to show a news species (or a known species in a new place) or otherwise contradicts prior knowledge means little on its own. If there is a cluster of observations from different accounts, that is more meaningful. Any researcher who wants to investigate further would probably want to go to the site themselves or contact the observers to follow up. I have seen surprising observations that were “validated” by sequencing, were shared by highly respected people with credentials, and strongly contradicted prior knowledge about the morphology or ecology of a species, but ended up being the result of an unintended mix-up leading to the wrong sequence being posted. I now instinctively remain agnostic about surprising observations until they are independently replicated. All of that to say that the same practices that protect our collective knowledge from unintentional errors and non-AI fakery will also help to mitigate the impact of any AI fakes that do end up on this site.

16 Likes