Missing or "wrong" license for GBIF

Hi everyone,

I am working on a project that compiles all amphibian and reptile observations in Austria from various data sources (mapping projects, literature, etc.), with the ultimate goal of submitting most of this data to GBIF for broader accessibility and use.

Citizen science data, especially from iNaturalist, play a crucial role in this project. Fortunately, iNaturalist observations are automatically transferred to GBIF when they reach Research Grade (RG) and have the appropriate license. However, we have noticed that many observations have “bad” licenses, and users are often unaware of this issue.

As of February, there were 23,595 amphibian and reptile observations for Austria on iNaturalist. Here is a breakdown of the number of observations by license type (for the observation, not the photo):

  • CC0: 811
  • CC-BY: 2772
  • CC-BY-NC: 13645
  • CC-BY-NC-ND: 80
  • CC-BY-NC-SA: 340
  • CC-BY-ND: 8
  • CC-BY-SA: 26
  • All rights reserved: 5913

Only the first three licenses (CC0, CC-BY, CC-BY-NC) are relevant for GBIF. Therefore, 6,367 observations, or about 27% of all observations, are not transferred to GBIF and are consequently excluded from scientific studies that rely solely on GBIF data (what is often the case).

We contacted all 1,260 users with observations not transferred to GBIF, explained the issue, and asked them to change the license. Over the last 3.5 months, around 300 users (about 25% of those contacted and 60% of the more active users with more than 10 “incorrectly” licensed observations) adjusted the licenses for approximately 2,800 observations. We received numerous messages from users who were unaware of the issue or assumed the license applied to both observations and photos. Not a single user reported intentionally choosing the “wrong” license for observations. Incidentally, my own observations were part of the problem; over half of my older observations lacked a license, and I wasn’t aware of it.

This issue extends beyond amphibians and reptiles in Austria, and I wonder if others have encountered similar problems. While the system isn’t bugged and the license information is available when you look for it, most users are simply unaware of the issue. Therefore, I propose that iNaturalist sends an official message to all users, asking them to review their license settings.

Best regards,
Christoph

Edit: “without license” → “All rights reserved” in the table.

8 Likes

I agree that educating users about license choice and encouraging users to select licenses compatible with sharing/scientific use is a way that iNat can maximize its impact. I think it’s tough to communicate to users since licenses are so complex for the average person, and there are many different options. There are some existing threads about this (such as the current: https://forum.inaturalist.org/t/preliminary-findings-from-a-qualitative-study-on-inaturalist-users-license-choices-show-there-is-a-substantial-portion-of-inaturalist-users-who-have-all-rights-reserved-by-mistake/52742)

As a quick note of clarification, by

you mean “All Rights Reserved”, correct?

Also, kudos on contacting so many users and helping increase data-sharing! I would also note that, while the stats you have are for herps specifically, the users almost certainly changed their licenses for all their observations (not just individually for herp ones), so the amount of data newly shared is likely much larger than the numbers you report here.

I think that a message asking users with more restrictive licenses (as opposed to all users) to consider their settings could be useful. It could probably be targeted in some ways to only users who have been active on the site for more than a year (for recent users, it might seem pushy to ask so quickly?) and those with more restrictive licenses. There’s no point messaging a user who already has CC-0 or something like that. A message could have links to the user setting page and a Help/Tutorial on changing a license.

2 Likes

Thank you very much for bringing my attention to the westboundwarbler’s thread, which is dealing with the same problem!

Yes, “without license” means “all rights reserved”. I corrected it in my first post.

I think most, if not all users changed the license for all observations. So in the end, the licence of maybe over 100k observations had been changed. Unfortunately we haven’t tracked this.

And I totally agree that it does not make sense to send a message to really all users and only to active users with restrictive licenses.

1 Like

licenses which don’t allow observations to flow to GBIF aren’t “wrong” or “incorrect” or “bad”. they are merely incompatible with GBIF.

is the issue here a lack of awareness of iNaturalist or a preference for using data only from GBIF even if there is an awareness of iNaturalist?

1 Like

I know of academic papers that were rejected because they used observations that had licenses that didn’t allow them to go to GBIF.

So I do think there is a preference for using GBIF data, even if the pressure to do so is top-down.

2 Likes

I want to know why CC BY-SA is not “Good choice for sharing with scientists” but CC BY-NC is? As far as I know, CC BY-SA is more free than CC BY-NC.

where are you quoting this from?

CC BY-SA and CC BY-NC have different restrictions. CC BY-SA observation license is incompatibile with GBIF, and if some scientists prefer to use data from GBIF, as some of the folks are suggesting above, then CC BY-SA-licensed observations will not make it to those scientists eyes because they won’t make it to GBIF.

1 Like

Obviously, iNaturalist.

can you provide more context? link? screenshot?

Find one of your observations and click “edit license”.

1 Like

the “edit license” pop-up does seem to explain:

personally, just for consistency, i would prefer the wording: “compatible for sharing with partner organizations” in the green highlights, unless “good choice for sharing with scientists” is also included in the top paragraph though.

I would also note that most scientists prefer iNat data come from GBIF for several reasons. In fact, iNat prefers that data be downloaded from GBIF for scientific applications when possible. It’s more easily traceable. It’s easier to cite, since the dataset has a DOI. It’s easier for iNat to quantify. It’s easier to query with additional datasources using the same set of criteria (since GBIF aggregates many data sources) - many scientists use iNat data in combination with data from other sources. Also, GBIF is more or less the de facto standard for biodiversity data at this point. with a lot of strong metadata and other types of support. So I think having data on GBIF strongly increases the probability it will be used by scientists.

5 Likes

i’m not a scientist, but i went to https://forum.inaturalist.org/t/published-papers-that-use-inaturalist-data-wiki-3-2022-and-2023/34753 just to see if i could find any that cite a GBIF download rather than iNat directly.

just occasionally opening up some of the papers listed there, i couldn’t actually find any that cited GBIF. (i gave up after looking through 10-15 of these.)

some of the papers cited specific observations. maybe these were undescribed and so couldn’t flow to GBIF. some of the papers cited specific iNat projects, which might be hard to translate exactly to a GBIF download.

i’m not doubting that making data available in GBIF makes it more available to scientists, but i’m seeing plenty of cases of people using data directly from iNaturalist.

I apologise for describing licenses which don’t allow a transfer to GBIF as “wrong”, “incorrect” or “bad”. As I said, everyone has the right to choose the license he/she wants, but without a doubt it is better to choose a more open license.

Scientists are of course aware of iNaturalist and are using it a lot. However, unless I am not making a study about citizen science or iNaturalist, the first choice for most scientists is GBIF, as I have observations from all kind of sources there. And as cthawley mentioned, the DOI is important to use the data set in a paper.

Pisum, the list you mentioned (https://forum.inaturalist.org/t/published-papers-that-use-inaturalist-data-wiki-3-2022-and-2023/34753) is a list of publications that are NOT getting the data via GBIF or citing GBIF. When you go to GBIF you can see that the iNaturalist data set from GBIF is used in almost 5,000 publications: https://www.gbif.org/dataset/50c9509d-22c7-4a22-a47d-8c48425ef4a7/activity

2 Likes

it’s not a just a more open license though. for your purposes, it’s better if it’s a license that is compatible with GBIF.

as @sunjiano noted above, a license like CC BY-SA is a more open license (compared to no license), but it’s not compatible with GBIF.

okay. thanks for the clarification. i see this now. the statement that mentioned this in that other thread was just further down the page than i read through:

1 Like

whilst I strongly strongly support users making their licences (for the observations at very least) one of the most open options, and it’s something I’ve been working on doing for Australian users, I think it’s important to highlight that there are quite a lot of situations where using data directly from iNat instead of using GBIF is not only preferable, but realistically the research aims can only be achieved through iNat. Three example use cases:

this is a key one. Any researcher collecting data via a project, especially traditional projects that may be based on more than a simple taxonomic or place filter, must use iNat directly as the project info (to my knowledge) does not flow into GBIF as a Darwin Core field or other metadata

  1. There are plenty of papers I’ve worked on where I needed to include Needs ID records, obviously GBIF is immediately unsuitable for this (sure, you could do a GBIF download and an iNat one and combine them, but this just creates unnecessary extra work, esp. with column matching etc)

  2. For an analysis I did on identification accuracy before/after expert input, there was a very specific time frame for the experts to make their IDs. So I downloaded all the data in the target area/taxa one hour before the event started, and then again immediately after it finished, so that IDs added outside the defined period were not captured. This wouldn’t have been possible through GBIF given the one week refresh delay.

4 Likes

What I’m particularly interested in, and maybe this has already been answered in another thread, is why those three copyright licences are the only GBIF-compatible ones? Aside from all rights reserved, what is it about the others that makes them non-compatible?

It’s a confusing one for me because of the Atlas of Living Australia (ALA). The ALA is Australia’s national biodiversity database and is the official Australian node of GBIF. Given the latter you might expect that it accepts the same licences as GBIF, but that’s not the case. When Australian iNat records go into the ALA, the ALA accepts every copyright license except for all rights reserved. So I’d love to know why GBIF is restricting their data feed to only those three

3 Likes

In software, there are different philosophical positions (often very strongly held) on the meaning of the word free. You have to keep that in mind when interpreting the table in that Wikipedia article.

CC-BY-SA “imposes” a licensing option restriction upon derivatives of your work but does not restrict commercial uses, whereas CC-BY-NC does not impose licensing restrictions upon derivatives but disallows commercial uses.

Given that most scientific literature is part of academia and this non-commercial, it’s the licensing imposition that becomes the sticking point, I’m guessing? I’m further guessing that many authors are not at liberty to decide the licensing options of their papers, but rather the publishers are, and those publishers generally are for-profit businesses? In other words, to someone who is already in a non-commercial pursuit, CC-BY-SA is actually less free, that is, it imposes more restrictions upon them.

Or something very roughly along those lines? IDK, I could be totally way off here.

1 Like

here’s GBIF’s original statement about why they adopted the 3 license options that they did: https://www.gbif.org/news/82363/new-approaches-to-data-licensing-and-endorsement. there’s a link to a document that provides more information on the responses from the stakeholders they consulted to make the decision.

3 Likes

I assume you are wondering why GBIF does not accept licenses with ND or SA elements. These elements of the CC model can be a bit tricky, particularly for something like data sets that consist of lots of independent pieces that may come from different sources and be used in different combinations.

My impression is that the CC model was primarily developed for creative works and audio-visual media; this means that there are certain challenges when it is applied to other types of content. (I support the principle of open access, but I have strong reservations about the way CC licensing has been adopted in science and research without – it seems to me – necessarily giving due consideration to the particular needs of scientific publishing.)

“no derivatives” is often interpreted to mean that something like excerpting chapters from a book for use in an anthology/textbook would not be covered by the license. For datasets, it is obviously greatly limiting if the license doesn’t allow for changing the data in any way (cleaning the dataset, using only a subset of the records, or combining it with data from other sources).

“share alike” creates complications if one is publishing something that uses multiple sources with different licenses; this can result in compatibility (“interoperability”) issues and what is known as attribution stacking.

There’s a good (and reasonably unbiased) overview here: http://discovery.ac.uk/files/pdf/Licensing_Open_Data_A_Practical_Guide.pdf

3 Likes