@jnstuart makes an excellent point here – the types of data that we are talking about (extralimital observations of reptiles and amphibians) are potentially useful to scientists and are also included in “traditional” natural history collections (NHCs). The idea that scientists don’t include these in NHC data (including those available on GBIF) is incorrect. Let’s take a look at a case study of Osteopilus septentrionalis (the Cuban treefrog) since it was mentioned earlier and I’m familiar with it (the subject of some of my MS work).
The CTF is native to Cuba and some other islands in the Caribbean. It is also invasive to FL and other areas. It’s a threat to native treefrogs because it is larger and can easily consume them (I’ve seen it! – very interesting), so there’s a strong interest in the species’ potential distribution with multiple papers modeling this in the past 15 years. Isolated locality records in the US (including likely hitchhikers) have been published in “smaller” scientific journals like Herpetological Review. That alone tells us that some scientists think individual records are worth taking the time to report (both the authors and journal editors, anyways).
I pulled the GBIF dataset for CTFs and split it into iNat records and other records (which are almost all from NHCs). The GBIF data from NHCs (excluding iNat) for Cuban treefrogs contain specimens from 16 states and 1 territory in the US: AL, AR, CA, CO, GA, HI, IL, KS, LA, MS, NY, NC, PA, PR, SC, TX, and VA where they are not native (excluding FL). Some of these were initial records taken from areas where there is invasive potential – in some states, this has been realized (eg, GA), but not all. There’s even a record from Germany! It’s a lot of work to accession a physical specimen and enter the data (much more of an investment in labor than making an iNat observation, for instance). The clear point here is that NHCs value the data associated with these specimens – otherwise they wouldn’t bother to record it, upload, use materials to preserve a specimen, and take up space on the shelf.
When looking at the iNat data by itself we see observations from 21 states and 1 territory in the US: AL, AR, CA, CO, GA, IN, KS, LA, MA, MS, MO, NE, NJ, NY, NC, PA, PR, SC, TN, TX, UT, VA where CTFs aren’t native. Kind of strikingly, this set is very similar to the NHC data – iNat has observations from all the same states as the NHCs except for IL and HI. To me, this shows that the distribution of iNat data for these extralimital sightings is quite similar to that found in the NHCs. Anyone using data from GBIF will need to “clean it up” to answer their research question/s (whatever it is) – they’ll likely process these GBIF data in a similar way whether iNat data are included or not. Almost every scientist I know a) expects to clean data before use and b) would much rather have a more complete dataset to work with and filter themselves. Restricted datasets are biased datasets (something most scientists want to avoid). While this is just one species, my personal experience suggests that, for other herps at least, the patterns in the GBIF data are similar.
More broadly, we shouldn’t argue to exclude data based on what we perceive to be useful – why would we assume we know or are able to predict what questions or data will be important to other current or future scientists (or people)? This notion honestly seems a bit hubristic to me. The use of NHC specimens is full of examples of unexpected discoveries that the original collectors/observers never dreamed of: the extraction of genetic materials from specimens of extinct species >100 yrs old (collected before DNA was known to exist!), measurements of stable isotopes in bones and feathers that tell us the diet of organisms in the past, determination of range shifts due to climate change and other causes, including early species range expansions – this could be a whole thread, really.
TLDR: In conclusion, I argue that the export of these types of data (waifs, hitchhikers, individuals organisms that may/may not represent an established population) from iNat to GBIF isn’t a problem and does, in fact, have value. More broadly, as @raymie noted, RG observations are a secondary goal of iNat. Extralimital observations of herps (or any organism) are just as valid for observers to make as any other. We can classify them based on the guidelines that are currently posted, but I think erring on “the side” of observers that their observations are valid and meaningful generally fits iNat’s mission and doesn’t damage the scientific use of iNat data as a whole.