Licensing problem for paper using iNat data

I’m pondering these issues myself, but for systematics work. I’m not at the stage where I have solutions to offer but I figured there are a few extra considerations that might be worth mentioning.

This may be neither feasible or desirable. Not feasible in the sense that if an observation represents a new taxon or an ambiguity within current taxonomy, it’s likely to divide community opinion. Not desirable in that, especially in the case of a new taxon, it could be seen as disingenuous (explicitly not bringing up academic ethics or community standards here) to add an ID we know is incorrect just to shunt it to Research Grade (to say nothing of multiple coordinated accounts adding IDs approximating sock/meat puppet problems I recall from 6-7 years ago); additionally, using circumscriptions of existing taxa different than those in iNat’s standard references before a justification appears in print seems to me pointlessly contrarian, except in especially clear-cut cases. Furthermore, once a new taxon is published, it may be slow to filter into iNat’s standard references or be overtly disagreed with by the custodians of those references.

I think this also brings up questions of what qualifies as a derivative work in this instance. I could see adding a new taxonomic determination or phenology classification, but what about rounding coordinates or converting date to month bins or day of year?

What about annotations? I don’t see a license for those in Settings. For my own phenology work I’m going through every data point individually and reclassifying phenology, but I could see this being an issue for large scale ecological research relying on scoring that has been crowdsourced on-platform.

I think this is the crux of the issue. iNat isn’t set up to be a scientific archive - a point I have made before at lab meetings etc. Academic norms do not necessarily apply. I don’t know enough or have desire to pontificate on the issue, but it seems to me that there’s a push and pull between the community and individual-focused model and research utility. Maybe this, or this plus enforced (or default-nudged) CC0 on metadata, is as close as it gets. Certainly there’s an irony in citizen science being less readily citable/accessible for formal work. I’ve seen suggestions and heard talk that iNat can replace herbaria for some/many functions, which scares me both in terms of data quality and in funding streams for natural history collections.

I think my preferred solution is to make data as derivative as possible, and hope that I or someone else will have the time and permits to repeat interesting observations with specimens. Fingers crossed for when it comes time to submit.

2 Likes

I think this an somewhat unrealistic view of the interplay of iNaturalist data and GBIF. For reasons such as inattentive or inactive observers or subsequent IDers, many usable datapoints in any given set for a taxon will fail to reach Research Grade, i.e. for reasons other than the correctness of the ID. The number of such would-be-could-be (non-)Research Grade observations can sometimes be considerable and significant if one’s data set is small. I’ve been dealing with this discord in a number of moth genera for years now. In an ideal world, iNat data would = GBIF data, but it’s not going to happen, so a researcher must make hard choices on which version of a dataset to use.

2 Likes

A perfect example to illustrate why allowing end users to retain control of the preferred license of observation metadata is counterproductive for the iNat community. It is unreasonable to ask researchers to check manually the metadata license of individual observations let alone seek permission when using an aggregate dataset of metadata from dozens or hundreds of observations. The fact that by far one of the largest open data repositories for the life sciences (Dryad) makes it impossible to reuse non-copyrightable metadata from iNat is a big red flag on how the current licensing model for observation metadata is broken.

NB I do understand the rationale for enabling license choice on metadata to accommodate for comments / location annotations etc but there should be at least a stripped-down version of observation metadata which doesn’t include any user-generated content and can be released as pure CC0 for research purposes like many other platforms do.

5 Likes

I don’t think anyone was making an argument for users to engage in this scenario:

A researcher going through and adding correct IDs to a group of Needs ID records to make them RG (and thus on GBIF) is a perfectly legitimate use of iNat and should be encouraged. It’s actually how a lot of researchers/experts get involved in iNaturalist. They either want to improve the existing data on GBIF or increase the number of records and so start IDing.

I agree of course that intentionally adding incorrect IDs or using sockpuppet accounts would be unethical. Both are explicitly prohibited by iNat’s Terms of Use and grounds for suspension.

However, even these these unethical approaches wouldn’t solve the OP’s licensing issue anyways - if an observation didn’t have a license that would allow sharing on iNat when it was “Needs ID”, that license would be the same when an observation reached RG.

As for annotations, I don’t think that there is a license that applies to them because there really isn’t any gray area concerning copyright - to the best of my understanding, there’s no way that they are copyrightable.

1 Like

If you’re worried, can you just license the figures from the paper that use iNaturalist data with a CC-BY-NC license?

Alternately, are you even sure that the “license” on the “data” is even a problem?

If you strip the observation data to the absolute geographic locations and dates and remove anything creative or overly precise from the contributor, then there seemingly shouldn’t be further copyright issues with the data…right?

2 Likes

As long as you’re not verbatim reproducing any original data from iNaturalist, I see no reason that figures or summaries of the iNat data would be a problem in terms of copyright?

Anyone who ever publishes a scientific review paper that involves aggregating data from a lot of previous papers is doing exactly this, despite the data from previous publications generally has entirely restricted copyright!

1 Like

OP is reproducing/posting their dataset from iNaturalist (which is important to do, since the data an iNat/GBIF download includes changes all the time due to new IDs, uploads, deletions, etc.).

But the crux of the issue is Dryad not accepting data with licenses more restrictive than CC-0 (regardless of whether copyright on the data is actually enforceable or not).

Side note: on Discourse you can combine replies to multiple posts in one post of your own using the quote functionality (like @ajwright did above) which makes threads a bit easier to read/follow.

3 Likes

I think there’s an easy fix. iNaturalist should have all observation data under CC0.

5 Likes

For what it’s worth, I uploaded a dataset about a year & a half ago to DataDryad that included a pretty stripped down set of iNaturalist data, limited to the following fields:
ID
user_login
scientific
latitude
longitude
URL

There was no objection from DataDryad at the time, but I do not know if this limited data is consistent with their current policies, if their policies have changed, if it was an oversight…

That dataset had the equivalent set of fields from several herbarium databases accessible online, some of which not use CC0 licensing, as well. At the time, it didn’t even occur to me that this kind of limited metadata could be a copyright issue, and I’m not at all convinced that there is a copyright issue here.

This seems analogous to citing a publication, to me. You can certainly include the title, publisher, author, date, etc., of cited works, regardless of their copyright status. You can quote small sections of text and reproduce various conceptual content, and so on.

5 Likes

And the fact that at the ‘high-end’ of observations, many of the INat community have spent lots of time, money effort on expensive photo gear and traveling to remote places - is expecting credit at least too much to ask?

4 Likes

I try to be as transparent as I can about who collected what data and so on, and I think all researchers should do the same—though, humans being humans, not everyone will. However, if you’re hoping that authors will thank you in the acknowledgements in a paper that includes 3000 observations by 1200 different people, you’re probably out of luck. :-)

Personally, I figure data is only worth anything if people use it. So, if a copyright makes it more difficult for people to use my data—and anything above CC0 does, to a greater or lesser extent—I don’t want to devalue my work.

2 Likes

Basic occurrence data itself is not copyrightable. The only things the CC BY-NC observation license actually applies to is creative content added by the observer, i.e. the notes. (The photographs are covered under their own licenses.) Unfortunately, it sounds like Dryad takes an overly-cautious approach to the licensing and won’t accept the data regardless. I think your decision to just host the data elsewhere makes the most sense.

I really wish iNaturalist would change the default license for observations to CC0 (while leaving the default license for photos as CC BY-SA-NC or whatever). That would avoid silly situations like this.

2 Likes

I want my data to be used for science and conservation as much as possible. If someone publishes a paper where my observations make up a large proportion of the data used, I’d certainly expect acknowledgment or even to be invited to contribute as a co-author. But if I am one of hundreds or thousands of observers adding one little piece to the puzzle, I just want researchers to be able to use my data as easily as possible. Disclaimer: I am also a researcher, so this may affect how I see this issue.

I’d support CC0 as being at a minimum the default observation license, with if possible clarification on the site as to whether the basic metadata (date, location, species) are included in this, or whether the license simply applies to observation notes and other ancillary information added.

6 Likes

Can you create an iNaturalist URL that will pull out the data you used, and then share that URL instead of the actual data?

1 Like

It’s hard to create a URL that will be stable. The best option I’ve found is to use an observation field specific to one research project—it won’t change nearly as much as, say, the community ID. But users might leave iNaturalist, or add that observation field to other observations without really knowing what it’s supposed to mean, and so on.

Longer term, iNaturalist may overhaul the observation field system entirely, or disappear from the face of the earth, and so on. For comparison, we can look up herbarium specimens from 100 years ago without much trouble. iNaturalist might be around in 100 years… but given the rate of turnover in internet services, frankly it’s not likely. Services like DataDryad are intended to be stable for the long haul. Whether that turns out to be the case is hard to tell at this point, but at least it’s the explicit, core mission, which is not the case for iNaturalist.

2 Likes

That’s a good idea. One could also create a traditional project and add the observations of interest to it, although both of these methods become inviable in cases of observers who choose not to allow others to add fields or to add their observations to projects.

1 Like

Creating a project sounded more involved, but admittedly I haven’t really investigated that option. :-)

With observation fields, for what it’s worth, a major limitation is that either it isn’t possible, or at least I don’t know how to, deal with large numbers of observations efficiently. Perhaps projects handle that aspect better; I should look into it next time it becomes relevant.

Interesting thread… I’m coming at this as a researcher who would want to avoid copyright infringement when publishing in the future, as I’m trying to work more with inaturalist data. But I haven’t published any GBIF/inat data yet.

What actually constitutes a copyright infringement for an observation? For media, the restrictions are more clear, but I cannot wrap my head around what this looks like for an observation. The main example here would be redistributing the data in a data repository. Assuming that you don’t need to store the data on a server, and could archive it privately (with it being available upon request), does a species distribution model constitute a derivative work from the observation data? If so, that would still necessitate attribution for all the of the attributive licenses (e.g., CC-BY) and prohibit the use of the non-derivative licensed (e.g., CC-BY-ND) observations. Maybe I am getting too in the weeds here, but I had never considered that this would be a copyright issue.

My intended use would be similar to @andrewggaier where I would use observation data (>10000 records) and create to look at distributional changes for a given species. For the fly species that I was just looking at, only ~300 of 10000 records were CC0. So only looking at CC0 data isn’t really an option. And attribution isn’t really an option either, other than maybe to include a supplementary file with a long list of names as acknowledgements.