Train a neural network using iNaturalist photos database

Hello everyone,
I’m interested in training a neural network using the iNaturalist and then exporting the neural network weights, which don’t contain any part of the images, to create an app with ad sense to generate revenue.

Would that be considered fair use?, Would the dataset owners be ok with that, and is that legal?

1 Like

see here: https://www.inaturalist.org/blog/49564-inaturalist-licensed-observation-images-in-the-amazon-open-data-sponsorship-program

You would have to only download images which are CC BY or CC 0 as all other licenses preclude commercial use.

2 Likes

There was also a set used for first iNat cv (as I remember) which is used by other parties, you can search for it.

1 Like

I would rather my data wasn’t used for commercial purposes.
But how would one know in this instance? ( beyond good faith )

+Surely use of images to train neural networks isn’t really covered by any CC license ( or even copyright / all rights reserved ) as they don´t involve a copy of the content itself ?

Indeed, it seems many neural networks are trained on copyright content :

https://www.reddit.com/r/MachineLearning/comments/4qrgh8/is_it_legal_to_use_copyright_material_as_training/

https://www.uspto.gov/sites/default/files/documents/OpenAI_RFC-84-FR-58141.pdf

I imagine copyright will increasingly become a grey area the more we see AI-generated content formed off models built from other people’s work…

I think it’s reasonable enough to use copyrighted content as training data. No part of the actual content is reproduced in any way. If I had a job as a nature center guide, I could learn to ID birds based on copyrighted images, and there wouldn’t be any copyright issues there.

That said, copyright and general usage/ownership gets weird when we start talking about digital things, so I’m sure there are edge cases and all sorts of funky legal bits to get into.

Depends on use-case… and if used for identification as you mention then obviously quite different…but if training data is used to generate new content it may not be so clear cut.
OpenAI´s GPT3 text generator regularly spits out line for line text from it’s training data despite inclusion of copyright material.

1 Like

The reddit page you cited does say that copyrighted content isn’t an issue, but licenses are because they involve reuse of material, see
"Licenses are where you enter legal gray area, and any model that would allow a user to recreate input data are right off the table.

Licenses are a really gross place to play in though. This is primarily because they are almost wholly indefensible in court, and the exact interpretations of the most popular licenses (think Creative Commons) are still tbd. If you look at a lot of those licenses in any kind of depth they are contradictory, and nigh unenforceable as legal documents, but this is kind of like playing with a grenade since the legal precedents here are totally lacking." (from the reddit page.

I think it’s likely only a lawyer could answer the legality of this question well, though I would guess that the likelihood of being sued by an iNat user is pretty dang low regardless.

My greater argument against this would be that users who choose a CC license for their observations that includes non-commercial use are expressing a desire that their work not be used for commercial purposes. So while using those observations might be legal (I don’t know myself!), that usage does seem like it would violate the intent of the users who posted them. I would worry that it might discourage some folks from posting if they knew that their observations would be used for commercial purposes, but I don’t know for sure.

4 Likes

Training means using a copy of that image, as programm doesn’t use original files, so yes, it should be under open license.

1 Like

I would definitely consult a lawyer.

6 Likes

I wonder if even a lawyer could answer this. From what I can see googling further, it seems as if the laws are just in flux and ill-defined/ill-equipped to deal with the issues raised by this sort of usage.

Grey areas of the law aside, the idea of generating money from iNat training data makes me feel pretty uncomfortable though. I feel if any money is to be made from training data here it would ideally go back into the platform (or to the users themselves).
At the same time, this sentiment also seems terribly naive if everything online is being scraped and used for openAI and the like in any case.

3 Likes

For what it’s worth, the company that makes PictureThis (which is commercial, although I guess it relies on subscriptions more than ads? I can’t bring myself to download it) does pay at least some people for their photos. I don’t want to turn this into a conversation about that app, but it’s something to think about.

2 Likes

But a copy a digital image and the original are totally indistinguishable.

1 Like

What is undistinguishable in a cut, downsized image with no original EXIF?

1 Like

Who says its cut down? iNat strips all the exif from images during the upload process anyway. (and moves the relevant portions into the iNat database)

1 Like

I disagree. If someone is making money off of your work, they should be paying you or at the very least asking permission. If they aren’t making money, then non-commercial licenses like the one I use on my uploads would be sufficient. The original poster will need to use only observations licensed to allow commercial use.

5 Likes

Yes, and it’s clearly not original, iNat system isn’t learning on big original files, probably this new one won’t also, but exif alone is enough to distinguish files, so I don’t get your question at all, commercial use is commercial use and uploading on iNat you have to agree to the use of photos on future cv runs, but only iNat cv, so as observer you didn’t give any agreements on this use and it doesn’t matter if file even saved from website, there’s no consent.

1 Like

Thanks everyone for your kind response, I know this is in a gray area right now, I know IBM is doing something similar with photos in Flickr, and for what I’m reading in your comments, is not uncommon to use photos for commercial use, specifically for classifying objects, I just want to present a useful product and i Don’t know of it would even be possible to pay every one for their photos,or even if it would be possible to contact everyone, as I wouldn’t have any problem doing that.

I want to help common people identify possibly dangerous specimens when hiking or walking in rural areas, I already know the limitations of this approach

1 Like

It may help clarify he question to elaborate on your reasons for wanting to create your app, and why you wanted to use iNat for it compared to other sources. For people unfamiliar with creating neural networks the details of what exactly you’d be using from iNat and how would also be best to explain. e.g., are you interested in the way the community ID algorithm works in general, and/or as applied to how photos are typically IDed? Or just in how the photos are organized into different taxonomic ranks? Other more complicated aspects to study which have been discussed here before are estimating ID accuracy, changes to Community Taxon, etc. It may end up being an interesting discussion even if you don’t end up using iNat for your app. I have no direct experience designing neural networks.

3 Likes

In 2021 the FTC ordered Everalbum to completely delete any algorithms/AI it trained on photos scraped without proper consent (see, e.g. https://epic.org/u-s-regulators-order-algorithm-and-data-deletion-in-settlement-with-weight-watchers/). It also recently ordered Weightwatchers to delete any AI that was trained on improperly collected data generated from children. In both cases, the cited rationale was that it wouldn’t be effective to just order them to delete the data because there is no way to un-train the algorithm from just the offending data. So regulators are trying to catch up to the issue of regulating these kinds of things, and not being able to reproduce the content may actually make the consequences of the infraction worse, because you can’t delete just one part.

It is generally not a good idea to assume any particular group of users wouldn’t sue for violating their ownership rights. Statutory damages for a single instance of copyright infringement can range from $750-$150,000, so you only need to be wrong about someone being ‘the kind of person to sue’ once for it to hurt.

None of this is in any way legal advice (nor would I suggest trusting any legal advice from strangers on an amateur naturalist forum, such as ideas on to what extent this might be a white/gray/black area of the law, or whether laws are likely to be enforced vs not likely to be enforced). You should really talk to a lawyer before doing anything like this for-profit.

5 Likes

even if you don’t consider the legalities of copyright, i think this kind of idea is begging for a lawsuit by someone who gets injured by something your app didn’t identify as dangerous.

if you’re going to train something, i think you should work on something that can be developed with a client that has a lot of resources to help verify that your product actually works as intended. for example, partner with a local health department to build a machine that will trap and identify disease-transmitting mosquitos or other disease vectors. or partner with a farming group to develop machines that will monitor and identify pollinators and their abundance, or pests and their abundance.

4 Likes