The NCBI statement sounds pretty similar de facto to iNat’s position in my opinion. iNat allows submitters to claim a copyright to observations/media but does not assess validity or get involved with adjudicating copyright.
But I think the big difference is that NCBI and NHCs are primarily scientific collections. iNat’s purpose is not. It’s to promote people’s connection to nature with scientific data being a happy byproduct. The importance of creating scientifically usable data varies substantially from user to user. Being an online social network/community requires having a way to allow users to license content that being a purely scientific collection may not.
I think it also matters that iNat observations often have notes, descriptions, or comments that actually may be copyrightable (ie, be creative works, not just facts). Genetic data really doesn’t exist in the same way - just facts like gene name, chromosome, etc. and a sequence of four/five letters, though maybe there are some other fields on Genbank I’m unaware of - I’ve only used it to pull sequence data. I personally appreciate that iNat essentially puts the option in the users’ hands.
As far as I understand it, a key part of this is also on GBIF’s side as well - they have a requirement that data be licensed in certain ways to host/publish it. So offering users CC license choices is a convenient way to determine which data meet the requirements for GBIF usage.
I have not seen it stated anywhere that a user must seek permission to use a CC-licensed work. In fact, the whole point of the Creative Commons licenses is to give this permission in a standardized manner in advance, precisely so that each individual potential user does not have to request any additional permission from the original author.
Creative Commons licenses give everyone from individual creators to large institutions a standardized way to grant the public permission to use their creative work under copyright law
This of course, is only for users who wish to to re-use the CC-licensed work with all the restrictions (caveats) included in the CC license (eg. -BY, -SA, -NC, -ND). Only in the case where a user wishes to then employ a less restrictive use than was outlined in the original CC license must they contact the original author for explicit permission to do so.
As an aside: even though I am not obligated to do so, I always contact the original author to let them know where their images have been used and thank them - even for CC0/public domain works. It is just courteous and most authors are curious to find out.
It is also worth noting that herbarium specimens are physical objects. Physical objects don’t have or need copyright protections because they have ordinary ‘ownership’. However, the design of a physical object, or text on it, could be copyrighted. The owner of a physical object of copyrighted design (e.g., a book) “receives the right to sell, display or otherwise dispose of that particular copy”. What kind of derivative things you can do with it (like, scanning it and posting it online) falls broadly under the category of ‘it depends’. Something like digitizing a herbarium specimens a collection owns, including the caption, for research purposes would probably be fair use, even if the text on the record was otherwise copyrightable.
Facts about the specimen are also not copyrightable, as said above. As is the case of inaturalist observations, the part that would be most legally ambiguous would be derivative works based on verbatim or nearly-verbatim captions, which could be fair use or not. I think semi-standardized captions of herbarium specimens are probably less likely than inaturalist observations to contain meaningfully copyrightable text.
The legal question is separate from the ethical question, of course. Ethically I think amateurs/citizen scientist contributors in any capacity should be acknowledged. Data produced by professionals should be handled/cited per the norms of the relevant field and the editorial standards of journals (an increasing number of journals in my field are requiring at least some kind of data policy statement with every publication submitted, to avoid possible issues in the future).
Absolutely. What’s in my mind doesn’t always come out when I write. I meant “copyrighted” (no CC license) rather than “CC license”. And the only aspects of the data that I have any concern about are species identification, location (and possibly date). I doubt that I’d be using any creative wording in the notes for any research publications.
Since GBIF doesn’t import photos or other media, just the observation data, why does GBIF not accept all CC licensed observations? What is it that GBIF does with the observation data which would cause them to not accept BY NC SA, BY NC ND, BY ND, and BY SA?
Also, would including the iNat (or GBIF) exported data (as a supplemental file with the publication and as long as it included the username and iNat ID number) be sufficient to cover all the things in all of the CC licenses, such as:
“Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.”
And would that export file constitute any sort of derivation? Since, “If you remix, transform, or build upon the material, you may not distribute the modified material.”
(Actually GBIF imports photos/medias: it parses their license media by media, just like it imports occurrence data, parsing its license occurrence by occurrence.)
GBIF’s stated goal is IIRC to promote free and open scientific data, hence their choice of disseminating further only those datasets and data blobs equipped with appropriate usage rights (CC0, CC-BY, CC-BY-NC). Copying/distributing data [files] (i.e. verbatim without personal touch such as sieving/addition etc.) is not the same as building upon (= intellectual creation from) data while acknowledging the respective contribution by data authoring person(s) and data processing person(s)).
I googled GBIF’s terms which explain some of their rationale for licensing choices. They also go into a bit of their thoughts behind non-commercial uses and basically say “there’s a lot of grey area, but GBIF isn’t in the enforcement business.”
I believe that the GBIF/iNat downloads also include a field for the licenses that each observation is licensed under, so you would need to include that in your supplementary file as well. Technically, you should provide a link to the licenses that apply to any observations that you used, though to my mind this is less important as anyone can google the licenses as long as you provide accurate info about which they are.
In terms of changes, I am not an expert in this, but, based on what I’ve read, a paper will describe how the data has been used, and you aren’t really “changing” the data so you should be ok there. The export file will have the original data, so you wouldn’t be distributing modified material, but the original data I think. I think this requirement applies more to situations like photoshopping images, etc. Technically you would also be distributing the dataset from GBIF which is licensed as CC-BY I think (again, see their terms).
Thanks for everyone’s comments. I did a practice download from GBIF and they actually include a lot of information that, I think, answers most of my concerns. Some information is at these links, but the downloaded zip file itself includes a spreadsheet containing the data with links to licenses, a suggested citation, etc. I understand now why iNat recommends using data downloaded from GBIF rather than directly from iNat.
I don’t think that GBIF does import iNat photos directly (though I could be wrong about this) - you can’t download them from GBIF as far as I know. I think they just link to photos hosted by iNat (or Amazon AWS if the photos are there perhaps).
By downloading a county-level dataset of one species, I discovered that only 7 out of 10 iNat observations of this species in this county were in GBIF. Digging deeper, three of them had the “observation data” copyrighted (thus excluded from GBIF and not really usable by the research community). That’s a much higher number than I was expecting.
Since there’s no explanation for what “observation data” is at the location where users select which license to use for observations, photos, media, etc. I suspect that most people have no idea they are preventing their data from being used by researchers. I certainly had no idea until stumbling across discussions in the forum (I’ve since changed all my observation data licenses to public domain in my account settings).
Someone would have to opt out at sign up, or later in Account Settings on the website, which also has an explainer about licenses and use. I can see people not totally understanding the licensing options (it’s confusing) but I think they would have had to at least know their data would be less useful to researchers when they opted out.
Maybe their API is just calling to the iNat photo record to display (what GBIF seems to describe as “integration” in their documentation) when they show it on the page for that occurrence? Though I guess that GBIF post is from 2018, so maybe things have changed since then?