Hello! I’m currently developing an AI model for animal identification in Europe. To train this model, I need to collect a large amount of data in the form of animal photos, and I came across a collection of images for each species on iNaturalist. However, iNaturalist’s terms of service state that using all iNaturalist content to train commercial AI models is strictly prohibited, even if the images are CC0-licensed. I’d like to clarify whether I can use the CC0 or CC-BY images on iNaturalist AWS open data collection to train commercial AI model. How can I guarantee that I can use it safely? I’ve emailed the staff many times over the past month, but I haven’t received a response. I hope this helps, as it’s very important for my project!
I’m not staff, but the Terms of Use state:
Prohibited Use for Commercial AI Training. Users may not use any iNaturalist data for training artificial intelligence, machine learning models, large language models, or similar networks, algorithms, or systems for commercial purposes.
So it seems pretty clear to me that if you don’t want to violate the Terms of Use, you can’t use iNaturalist data to train a commercial AI model.
Thank you for the reply! Yes, I also mentioned that iNaturalist Terms of Use prohibit this, but I wanted to clarify about AWS Open Data registry https://registry.opendata.aws/inaturalist-open-data/ , because it seems to be used for AI training.
when i read the terms of use, these only apply when accessing stuff through the website, apps, API, and other things operated by iNaturalist or affiliate networks. it seems to me like the AWS Open Dataset would be excluded, just as data pushed to GBIF would be excluded.
so i would think that the thing that applies if you’re accessing stuff directly through the AWS Open Data Set is just the image license and observation license.
I would expect that the content from most users is licensed as „CC BY-NC“, which includes non commercial.
I guess @tiwane could provide a more specific answer
Yes, but OP is specifically asking about
please don’t.
I hope the plan is not to have large numbers of drones flying around over critical habitat look for and imaging threatened populations? That sounds like wildlife harassment, and might well accomplish the opposite of your well-motivated conservation goals.
It’s exactly this sort of commercial theft that makes me limit the use of my observations.
Regardless of whether it’s technically allowed or not I completely disagree with this proposed use and I suspect that the majority of iNat users, especially those on the forum, would also disagree and be very upset if you were to use their data for commercial use like this.
I do not consent to have my observations used to train a commercial AI model. Inat is a safe place where observations are not used to train AI models or anything similar without the permission of the observer(if that is even allowed I do not know). This is one of the primary reasons why I do not upload my photos to other social media platforms.
if you want answers to legal questions, hire a lawyer.
also, if your response to getting told “no” is to try and find loopholes, that’s a big red flag to me that you don’t respect consent. if you were an active community member or working with a trusted party, iNaturalist and its users might have been inclined to find a compromise.
I didn’t know that iNat data is used by Amazon Web Services? I’m not really pleased to hear that (ie even no CC0 data).
Looking at the licensing settings on iNat, only Gbif and Wikipedia are mentioned, not AWS… I’m curious to know which companies are using that data?
It seems to me that more suitable material for training an AI to recognize wildlife in drone photos would be…drone photos.
iNaturalist photos are taken from the perspective of a person with a camera – i.e., in general, fairly close to the ground, which would presumably not be the case for drones.
iNaturalist observations also do not systematically record the number of specimens present or other species present in the photo. For example, iNaturalist users may pick out one individual in a mixed flock of birds that they are interested in, but the photo is typically not marked with any indication of where the individual of interest is located. Presumably if the goal is to use drone photos to improve population estimates, you also need as way to get a list of all species recognizable in a photo and how many of each. In other words, the existing labels on iNaturalist observations will unlikely be sufficient for your purposes; the photos will need to be tagged individually with the information relevant for the recognition algorithm.
First of all, I love nature and will never develop an app that in any way threatens or disturbs forest animals. I’ve clearly stated that my goal is to help conserve wildlife, not threaten it. I don’t understand why I’m being hated so much for trying to save animals. I’m not trying to find any loopholes; I’m simply clarifying information about iNat AWS’s open data. If I can’t use then that’s it, I just didn’t get any definite answer. I’ll use one drone for large areas with a minimum altitude of 75-100 meters to not disturb animals. Current methods for estimating population sizes are too ineffective, so I’m trying to save some species from extinction.
I am using specific augmentation techniques and exactly need iNaturalist type data to create my own dataset.
Many animals rely on specific features to properly identify.
Image augmentation techniques, especially AI drive ones, often add, manufacture, or obscure these specific details resulting in poor results and misidentification.
This is a poor practice for your proposed use.
I will try, I am doing manual image augmentation which is not that bad, you are right using GenAI tools will be a bad practice
You are not being “hated”, you have people offering their sincere opinions and backing what they say up with reasonable explanations and concerns.
What do image augmentation techniques have to do with the points I mentioned?
If a drone is taking photos from the air, animals will look considerably different than photos taken from close to the ground. A view of a deer from the side is not going to help an algorithm recognize deer photographed from above.
Image augmentation is also irrelevant for how an algorithm will find subjects in a photo or determine how many there are or whether there are multiple species present in the image; this is not information included in iNaturalist observations – in other words, if you have to examine and retag the images anyway, the advantages of using iNaturalist images as opposed to your own data set is surely limited.