Licensing problem for paper using iNat data

Hello

I am running into some issues with licensing regarding iNaturalist data for an accepted paper and I was hoping there would be someone here who could help. I recently had a paper accepted where I use occurrence data from iNaturalist to develop a series of species distribution models. When I submitted this paper, I stipulated to the journal that I would release all data and code on dryad upon acceptance. It has now been accepted and I need to redistribute my data; however, when I submitted my files to dryad they rejected it, stating that “iNaturalist occurrence data is in conflict with the CC0 waiver because many of the observations were uploaded with a CC BY-NC license”.

For some background, I downloaded my occurrence data directly from iNaturalist as opposed to through a third party such as GBif. Neither myself nor my coauthors were aware of this licensing issue while working on the project. I have done some searching around the internet and have been in contact with both dryad and iNat support, and it seems like I have two options:

  1. Contact each individual user and ask there permission to use the data

    • There are about 30 observations where this would be the case
    • It seems unlikely that all of these users will still be active
  2. Use GBif as my source for iNat occurrence data

    • There are a fewer number of iNat observations on GBif for this species than there were for my original dataset from iNat
    • This would likely change my results, which would mean I’d need rerun all of my analyses, rewrite parts of the paper, and redo figures. This is not the ideal course of action since we are at the final check stage before going into press.

If anybody has any advice on how I should proceed, it would be very much appreciated. As I said, we are at the final stages of getting this paper out so we really want to get on top of this.

Many thanks in advance.

4 Likes

I was under the impression that a CC BY-NC meant you had to give attribution to each user, but you didn’t have to seek permission from them. So it’s possible that listing the contributors in the paper might be sufficient?

https://creativecommons.org/about/cclicenses/

https://creativecommons.org/licenses/by-nc/4.0/

A note about appropriate credit from the cc site:

3 Likes

It can’t be used for gain, some paperwork for some reason falls under commercial purposes.

1 Like

OK, but what about whether or not permission needs to be sought from the original data creators?

Link below with more info on commercial usage.

https://creativecommons.org/faq/#does-my-use-violate-the-noncommercial-clause-of-the-licenses

1 Like

It seems that as far as Dryad is concerned, this usage would infringe on the license. I am still a bit confused by a lot of this, but here is the full message they gave me:

“For your dataset, at least the iNaturalist occurrence data is in conflict with the CC0 waiver because many of the observations were uploaded with a CC BY-NC license. As an iNaturalist contributor myself, this is a long-standing issue related, in part, to whether “facts” like occurrence data can be copyrighted at all (this confusion is not exclusive to iNaturalist either). However, Dryad’s stance is that we will not redistribute data that contain conflicting licenses without express indication from the copyright holder that the copyright only applies to an attribute not included in the data (e.g., photos). We would encourage you to reach out to an appropriate contact at iNaturalist for more clarification on this. Possibly, you might even be able to get in contact with the submitter of any copyrighted entries given the small number.”

2 Likes

Do you have the option of finding another data store that is more accommodating?

3 Likes

Yes there are other repositories that I can use but I could very well run into the same issue. ESA has a pretty high standard concerning where data can be released.

https://www.esa.org/publications/data-policy/#panel-tab-content-0-2

Could be worth a shot though

It does seem that the main sticking point is that you’re taking iNat data with multiple licenses and seeking to share it on Dryad, which explicitly requires CC0 licensing. Your contact there makes a good point that occurrence data from an observation likely doesn’t contain enough originality to be copyrightable, but that won’t solve your problem because Dryad won’t accept the data unless it’s licensed under CC0. So you need another repository.

Looking through that ESA page, it seems that Figshare might serve your purpose. I have zero experience with them, but their licensing and copyright pages do seem to offer a lot more options:

https://help.figshare.com/article/how-to-choose-the-most-appropriate-license

https://help.figshare.com/article/copyright-and-license-policy

You’ll still need to bear in mind the distinction between the license that (purportedly) applies to each observation whose data you’re using and the license you apply for your overall dataset.

5 Likes

Can you try to re-explain what you’re trying to share on Dryad - i.e. content of relevant (and problematic files), and what the objective of the key file(s) is for users?

i.e. You’re not including copies of the photographs themselves right? It’s basically seeming like more about the extracted metadata only. I might be totally misunderstanding, but if objective if to allow reads of your paper to replicate your dataset etc, then why not stupidly simple like a list of URLs in a spreadsheet? Those could also have usernames there too - so essentially named credit to those who asked CC BY-NC

6 Likes

I think you’ve actually outlined your issues quite succinctly and the email you got from Dryad matches my understanding. There isn’t a way to convert or repost the data with a CC-0 license if you didn’t access it under one or can’t access it now under one. This actually isn’t really an issue for iNat staff - it’s up to the end user to use whatever they download appropriately (just like an image from Wikipedia, etc.) Also, while downloading from GBIF is generally the best way to access iNat data, I don’t think it would solve this problem, as I believe that GBIF datasets also include observation data which have more restrictive licenses than CC-0.

I agree with @russellclarke and @rupertclayton that looking into other data repositories that allow you to publish your dataset as is would be the easiest/most efficient solution. I know that figshare has been used for other papers using iNat data.

There may also be a repository associated with your institution which would allow this (I’ve worked at two institutions that have these). This would also be a good question for a librarian with digital expertise at your institution.

Good luck and hope the paper can be finished off soon!

4 Likes

Wonderful question, and something I had not considered in setting my license.

Unless I am misinterpreting the text itself, I believe you’re safe (somewhat contingent on the publications’ compensation policies) for two reasons:

  1. From the legal code itself, bolding mine (https://creativecommons.org/licenses/by-nc/3.0/legalcode)

You may not exercise any of the rights granted to You in Section 3 above in any manner that is primarily intended for or directed toward commercial advantage or private monetary compensation. The exchange of the Work for other copyrighted works by means of digital file-sharing or otherwise shall not be considered to be intended for or directed toward commercial advantage or private monetary compensation, provided there is no payment of any monetary compensation in connection with the exchange of copyrighted works.

So I think this means that if there isn’t money changing hands (to your benefit), and that the primary use is non-monetary this should be considered an acceptable use.

  1. A little shakier of a reason, but I believe the licenses are intended to cover media as opposed to data, meaning that while the image may be subject (under a possibly overly-broad interpretation of CC-BY_NC code itself), the data derived thereof wouldn’t be covered by the license.

So, if you’re not publishing the images themselves, I think you’d be safe as well, even under what I think is misinterpretation of the language for scientific publications. This obviously would become grayer in the case that you were publishing for direct monetary benefit.

I may be incorrect on this point however, if the CC licensing is sufficiently broad in interpretation of ‘work’

Outside of the two points I’m making above is that of course to avoid any possible issues, journals may just decide that they won’t accept data outside of CC-BY-0, which would be their loss, since I believe journals such as PLoS accept and even publish under CC-BY-4.0 which I think circumvents any possible issues with commercial gain.

It’s a pretty dense, and somewhat confusing issue, but very interesting if you’re not on a deadline to publish!

P.S. Creative Commons (CC) themselves take a iNat-like approach (i.e. cultivated/captive vs ‘wild’) in not getting overly specific on how they define a lot of their licenses, by design, which is why they have an FAQ page, including one on your question.

2 Likes

Thanks everyone for all the great advice! Copyright law was never something I expected to run into as a biologist!

I opted to put my files onto figshare. I probably could have found some type of work around for Dryad, but at this point I just want to get this paper off my hands for good! Never used figshare before but it is free and seems pretty flexible in terms of licensing.

9 Likes

One more point, which may be a nice loophole for data (not images) that I found this in iNaturalist’s Terms of Use, bolding mine (https://www.inaturalist.org/pages/terms):

By submitting Content to iNaturalist for inclusion on the Platform, You grant iNaturalist a world-wide, royalty-free, and non-exclusive license to reproduce, modify, adapt, and publish the Content solely for the purpose of displaying, distributing, and promoting Your observations and journal via iNaturalist, and for the purpose of displaying or promoting the Content or iNaturalist itself in other venues, such as social media or software distribution platforms. We may repackage publicly available information associated with the Content in a machine-readable format for a handful of partners, including the Global Biodiversity Information Facility (“GBIF”) and the Amazon Web Services (“AWS”) Open Data Sponsorship Program, and others. You represent and warrant that (a) You own and control all of the rights to the Content that You post or You otherwise have the right to post such Content to the Site; (b) the Content is accurate and not misleading; and (c) use and posting of the Content You supply does not violate these Terms of Use and will not violate any rights of or cause injury to any person or entity. If You delete Content, iNaturalist will use reasonable efforts to remove it from the Platform, but You acknowledge that caching or references to the Content may not be made unavailable immediately.

If you source the data from GBIF (they also use CC-BY-4.0), I believe it was already transferred into their licensing requirements by the terms of use above.

Even more reading, Creative Commons (CC) has a wiki on this very subject, with their own disclaimer that it’s not legal advice: https://wiki.creativecommons.org/wiki/NonCommercial_interpretation

Yes, that’s what I was thinking. Instead of sharing the raw data, share a reference to the data.

“All problems can be solved by adding another layer of indirection.”

2 Likes

This is contradicted by the fact that media and observation data on iNat have separate licenses. Most people who understand this change their observation data license to CC0, but most people are not aware of it.

It kind of is, in that there doesn’t seem to be a clear reason why all observation data should not be CC0 by default. iNaturalist should really encourage observers to choose the right license for media (CC-BY, copyright, whatever works for the user) but I would guess that 99% of users don’t realize that there is separate licensing just for the observation data, which as in this case is introducing unnecessary restrictions on sharing of data.

Why is that? Non Research Grade observations? You and your co-authors should be able to fix that by adding correct IDs, and within a short time, those observations will be synced with GBIF.

7 Likes

I agree that there are separate licenses for media and observation data. I also would like to see iNat encourage users to set their observation data to a more permissive license (and I also share doubts about whether the license for the observation data is even enforceable in 99.9% of cases).

But I don’t think that this is the crux of the issue for Dryad. Their email states that:

iNat’s policy and licenses are pretty clear (for those who read them though I expect most users don’t) as each observation (with media) has both a media and observation license. So there isn’t anything else for iNat to clarify - there are already separate licenses here, and the license info for each observation is accessible. I’m not sure what else iNat can do to clarify other than asking users to individually confirm an observation and media license for each observation which seems like a non-starter.

This strikes me as a choice that Dryad has made about data that they will host that precludes them hosting iNat data. The policy also apparently precludes hosting data from GBIF which is essentially the standard for biodiversity data. Dryad is well within their rights to make that their policy, and there are probably good reasons that they chose to do so. It also means other options (like Figshare) are better for iNat data.

On a side note:

This could also be because users have set their observation data license to “All Rights Reserved” or another license incompatible with export to GBIF. Even if these observations are RG, they don’t get exported.

Anyways, @andrewggaier I’m glad that you got it sorted so quickly and easily!

5 Likes

Unfortunately that doesn’t address the issue of the fungibility (word?) of iNat observations. The owner of an observation can come in at any time after the researcher accessed it and edit or even delete it. To maintain scientific repeatability and falsifiability, researchers really do need to archive the snapshot of the data on which they based their analysis.

7 Likes

Okay, how do I find the separate setting?

1 Like

under edit settings → Content & Display

3 Likes

Thank you. I didn’t understand what the “observation license” is and that it was separate from the photo license.

2 Likes