Multiple Platforms and Database Redundancy?

As I become more involved with citizen science, I come across other projects that I want to contribute my records and data to. Each project offers unique goals and contribution styles that I see value in. Certain quality expectations also influence what projects I can submit certain records to. (i.e. without media of a bird, I can still record sightings of the specimens on eBird checklists but would not be able to submit the record to iNat which requires media evidence for CID)

I will upload records to BugGuide if I have media of a certain quality and/or the specimen fills in a location/time occurrence gap within their hosted data. I will upload media to eBird if it is a certain quality, fills in a media gap (for location/time range) within the media library, or needed for record confirmation. I recently downloaded the Global Raptor Impact Network (GRIN) app and I am reporting ad hoc observations to their network of all my existing raptor records. But these observations, if I have media, are also hosted on iNaturalist and when they become RG they go to GBIF. I have some concern that another project’s dataset would be hosted on GBIF and my same records would become multiple. I do use the same username throughout my accounts and try to link to other hosts of my data in the records.

I have asked another project directly if it was okay to submit data that was hosted on other projects and they stated that it was okay as researchers can trim redundant records from their data. However, I still find myself wondering about how much I am able to share my data between projects like this.

I am interested in other iNaturalist users’ experiences and thoughts on submitting data to multiple projects. Are there methods to reduce trouble for researchers if data is shared between projects? What are community opinions on participating with the same records in multiple projects? How can one help projects without creating data issues?

1 Like

You might want to change the title of your post to “multiple identification platforms” or something. Until I read the whole thing, I thought you were asking about having an observation in multiple iNat projects, which I think is quite common and not problematic at all.

I think the first place to start with your question is whether the platform you are thinking about cross-posting to contributes to GBIF (or other platforms). For instance, as far as I know, BugGuide is totally self-contained and I’ve run into quite a few people who post at least some of their observations in both places. As far as I can tell, their thought process usually seems to be that easy/common observations only get posted on iNat, while tricky/new to them insects are more likely to get posted to both iNat and BugGuide.

5 Likes

It’s a common misperception that having multiple records of the same species, or even same individual organism, on different platforms causes problems for researchers. Researchers are basically not going to use these sorts of data to estimate abundance. At most, they might use the data to identify encounter rates - which species are recorded more often by people than others - but duplicate records are the least of their problems.

The bigger problems for researchers are inaccuracies and imprecision in recording location (e.g., if multiple people photograph the same organism but record it at different coordinates, which is common on both iNaturalist and eBird, for example); misidentifications, whether from computer vision or otherwise; taxonomic mismatches; lack of observer coverage in large areas; and other sources of error and bias. These can cause headaches when analysing citizen science data, but duplicated valid records of a species in a location are not a problem.

9 Likes

I post all of my bumblebees on BumbleBee Watch as well as iNaturalist. BBW does not incorporate iNaturalist observations into their dataset because all identifications are vetted by one of their volunteer experts. I also occasionally post lepidoptera to Butterflies And Moths of North America (BAMONA), dragonflies to Odonata Central, and other invertebrates BugGuide to get help with identification - and wish I had the time to post ones that fill location gaps, since I’m in an undersurveyed area. To help any researchers who might be pulling data from iNaturalist as well as these datasets, in the record comments I always list the other datasets I’ve posted the same record in. I include the other dataset’s record # or in the comment. It adds a layer of tedium to posting, but it also helps me remember where I’ve submitted the records as well as (hopefully) helping future researchers eliminated duplicates.

4 Likes

I had this concern after partaking in an African bat study and then uploading that data to iNat. Somebody reached out to me about taking the iNat data and using it for a different bat study, but they weren’t sure if they had already taken the data from the initial study. I was a little nervous that I would be mucking up their numbers but they told me that there’s always the possibility of errors, and that anything noticeably significant could always be investigated further. Nothing that I recorded was a wild outlier or somehow groundbreaking so my worries were swiftly eased. They didn’t seem very worried about it at all, so that to me indicated that I probably shouldn’t be too worried either.

2 Likes

I heard in an interview with someone from US Fish and Wildlife Services that they would prefer that all observations be uploaded singly to one platform. But as far as I am concerned, if they don’t have ways to sort the data coming in, that’s their problem and I couldn’t care less. They aren’t the only group that needs the information and I’d rather make it as available as possible.

For taxon that I’m not familiar with, I frequently upload to multiple platforms to get a more sure ID.

2 Likes

I would just change the licence on one of platforms, so data wouldn’t go to GBIF from multiple sources.

1 Like

I heard that the researches indeed did some datacontrole, datacheck for duplicates if they uses multiple sources and data deduplication seems to be the more easy ones of datachecking.
If you take data from multiple sources one should know to check for duplicate data.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.