Are iNat observations being scraped by AI art generators?

I tried the prompt “iNaturalist Theronia atalantae” into Stable Diffusion online. The results looked nothing like Theronia atalantae. The AI did generate insect-like images but they were clearly not of this world. Even a simple " Mallard " prompt produced some odd-looking ducks.

It seems that, at least for Stable Diffusion, the algorithm is not intended to create/reproduce accurate representations of existing creatures.

2 Likes

there’s a page that allows you to search the training set used by Stable Diffusion. here are the results from 2 of 3 available sets for ‘leucomonia bethia’, which i think will shed light on why your generated images look the way they do:

there’s information about Stable Diffusion’s training here: https://github.com/CompVis/stable-diffusion/blob/main/Stable_Diffusion_v1_Model_Card.md#training.

just from my very limited poking at it, it doesn’t look to me like iNat’s observation dataset was used as part of the data set, but i didn’t actually try to analyze the set because it would be billions of records to plow through.

5 Likes

Thanks for your insight. Our project has concluded, but I’ll revisit this threshold with my colleagues the next time it comes up. I appreciate the context re: the Berne Convention.

3 Likes

I didn’t comment on this earlier because I was just flabbergasted by this assumption. When identifying stuff, I routinely come across images taken by professional photographers who use iNaturalist to figure out what they’ve photographed. Two in particular come to mind who are posting images mainly of particular taxa as they are working on putting together a field guide to be published once they’ve got them all covered. It’s not just all amateurs. Nature photography is a career choice for some and an art that takes a certain amount of patience and skill. That’s why there are so many nature photography contests out there. So to justify fair use, I would definitely look for other types of arguments and certainly pay attention to image licenses.

4 Likes

Ahhhh I think I figured out how the question of ‘individual character’ and ‘documentary in nature’ is relevant to copyright infringement of inat observations! But not in the way it is described as being applied:

Case 1: I post an observation of a plant with a specific sequence of shots in a specific order (say, shot of plant, wider context shot, close up of flowers, close up of leaves, underside of inflorescence). Another user posts a picture of the exact same kind of plant with the same sequence of shots in the same order. Because my shot sequence is documentary in nature and lacks individual character, and because they didn’t steal my actual photos, I probably can’t sue them for violating copyright on my shot sequence.

Case 2: A professional photographer posts an album which is artistic in nature and has a highly distinctive and unusual sequence of shots, locations, and editing style. A second photographer posts their own album with virtually identical sequence of shots, locations, and editing style. The first photographer probably can sue them for violating the copyright on their shot sequence, because it does have an individual character, and is not primarily documentary in nature, even thought the specific individual photographs are not being stolen.

Whether the ‘idea’ of a photo is distinctive enough to be copyrighted on its own is not relevant in the case of scraping photos, because we are talking about a use of the actual specific photos which for sure are protected by copyright, not about use of the idea of the photos, which is a fuzzy distinction that you’d have to determine on a case-by-case basis.

In terms of copyright, reproducing is very different to “stealing”.
I am not stealing 10% of the actual Mona Lisa, I am reproducing it.

What would you be comfortable with? 1%? 0.1%?
Someone only copying Mona Lisa’s eyes? a single pixel of a digital reproduction?
There has to be a threshold at which it becomes arbitrary.

If I use 1000 images of zebras to generate 1 new one using a neural network, is this so different to me as an artist drawing a zebra from memory using the 1000 images of zebras I´ve seen over my lifetime?

Everything an artist ever produces is to some extent copying a % of the work of the people they have seen before them. We do not exist in a mental vacuum, free from external influence.

So yes, to me it makes sense that reproducing some % of an image is fine by law.

I agree there needs to be better legal protections to cover the current shift with AI though.

I agree. What’s even the point of uploading AI images in the first place? This completely destroys the actual data on the entire platform. I’m hoping someone was just trying to see how accurately AI can generate different species without realizing that the AI images are detrimental to the data collected. Let’s hope this is not something that becomes commonplace. I love studying bumblebees and I would be very disappointed if people started adding endangered ones that are actually just AI generated. It defeats the whole purpose of this great app. It would be great if there was an AI button that could be used whenever someone uploads an AI image, much like the Captive/Cultivated one. (As long as the people using AI were adamant about marking them.) That being said, I still don’t agree with using AI on this platform, but I don’t believe that it will stop anytime soon.

1 Like

Theres no need for an AI button because it is just the existing ‘no evidence of organism’ flag. Unless you mean AI image generators should be required to use a nearly undetectable and difficult to destroy watermark, something like https://www.digimarc.com/products/brand-protection (such watermarks are, for example, how they make photoshop refuse to let you edit pictures or scans of high-denomination US currency). I could be on board with making something like that either a regulatory requirement or industry standard.

We didn’t assume that all iNat photos were taken by amateurs, we assumed that most were. I know there are exceptions. FWIW, in my area of taxon expertise (snakes), it’s very rare that I come across a photo that has been taken by a professional nature photographer (probably <1/100) and those photos almost universally have ‘All rights reserved’ licenses (so we didn’t use them for our project).

The project, I assume, is fine as long as your only using photos that are licensed to creative commons (especially if its not for profit, which it seems like it is.)

but I’m also absolutely flabbergasted by the assertion that ‘individual character’ matters at all when it comes to the individual copyright of photographs. The only thing that matters is what the photo is licensed as, not a group trying to determine if a photo is professional or not.

EDIT: like, this sort of justification is how we’re getting AI art generators just carte blanche stealing people’s art for their algorithms.

hey y’all, i suspect this isn’t really the place (yet) to debate the ethics / appropriateness of using any particular set of photos for generative AI purposes. as far as i can tell, very few people really understand the mechanics of how generative AI works, even in broad strokes, which is somewhat important to even beginning to have a full discussion.

moreover, the tech has advanced well beyond the existing moral understanding and legal codes, and the debate in this space has only really just begun. i’m thinking that joining conversations in other forums (ex. legislatures, courts, etc.) will provide much more satisfying debate and impactful results, should you choose to continue the debate.

4 Likes

Honestly, you’re probably correct

For a relevant discussion about copyright, see stable diffusion having been asked 93,000 times (as of that article’s writing) to generate the work of one specific artist: https://www.technologyreview.com/2022/09/16/1059598/this-artist-is-dominating-ai-generated-art-and-hes-not-happy-about-it/

I’m not sure how a useful conversation can be had if restricted only to people who understand how it works technically. My understanding of how it works is, in detail, quite poor, but I would also guess nevertheless in approximately the 95th percentile. Actual policy and legal decisions will be made almost exclusively by people with almost no technical understanding; I doubt most judges have the faintest technical clue how email works.

I re-read my original post and realized that I should have written “we were advised” instead of “we reasoned that” because I was also surprised to learn that this was relevant. I want to emphasize that to the best of my knowledge, this standard is only applied in Swiss copyright law, and it’s only one of several standards that might be applied, the others being the ones I listed in my original post.

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.