As a source of inspiration, here is what eBird provides for best practices when using eBird data https://ebird.github.io/ebird-best-practices. This goes way beyond the “what to care about when using eBird data”, but I think it is a great example of what could be done to inform researchers about challenges with iNat data. Compared with a paper, it would also have the advantage of being editable.
I think there are probably many observations that are Casual because they show a tree that was planted decades ago, or aren’t identifiable past family level or whatever that are just as or more theoretically valuable than some Needs ID observations that are extremely blurry or not identifiable past genus level. It’s hard to predict how someone might find value in an observation and the lines between Casual and not are often subjective or arbitrary. I find it somewhat confusing why the line is drawn where it is for different variables but for most cases I think it makes sense to at least have a line somewhere.
Not all users show such restraint. Cases where people will use the DQA to make observations casual even though they are legitimate observations according to iNat’s guidelines include:
- escapees and hitchhikers (because some people don’t want non-established species showing up on range maps)
- duplicates (because this supposedly distorts abundance data, wastes IDers’ time etc.)
- field sketches (because not everyone considers this legitimate evidence)
- observations with large accuracy circles (because not precise enough to be useful for them/because they think such locations are per se not trustworthy)
I see the main issue with such cases not being the (relatively small) number of observations that might be affected, but the fact that they tend to crystallize as a source of conflict, resulting in DQA wars or long heated discussions that don’t seem like a good use of anyone’s time and energy.
So I am in favor of anything that can encourage more tolerant attitudes and acceptance that iNat treats some observations as valid based on idiosyncratic criteria that don’t meet the standards one might personally want to apply, but this doesn’t diminish the value of the data set as a whole and it’s not a reason to insist on imposing one’s own “better” standards.
Please note that having them show up on range maps could have another effect - they set a precedent that observers will could use as “evidence” to support subsequent observations. I see this all the time. There’s a breed of observer who attaches inordinate importance to outliers (these are the folks who trumpet finding both geographic and temporal outliers). There are people who compile guides and species lists for geographic regions who cite earliest/latest observations (rather than loping off the head/tail of seasonal distributions). I’ve discovered cases where decades worth of misidentifications turned out to all be based on one or two initial misidentifications that served as precedents to support all the ones that followed.
I’d like to keep these kinds of observations in the casual category, but if I can find a way to keep them out of MY data, then I’ll settle for the workaround. I have to pick my battles. Would be nice if we could keep the junk out of GBIF, but I’m not sure that’s the hill I want to die on (given how much legacy museum specimen data in GBIF is also junk).
Another lost cause, though I have mechanisms for detecting them weeding them out of the data at my end. Still, it would be nice to not waste my time with these at the front end - which is one of the reasons why I’d like to have an “ignore” button, rather than having to roll my own (which I’ve done).
But why do you say “supposedly”? Are you denying that duplicates wastes IDers time, system resources, etc.? Note that aside from wasting the time of active IDers, duplicates could also discourage some folks from ever becoming IDers. They don’t even have to recognize that many observations are duplicates - just the size of the Needs ID backlog might be so daunting that they never even try to tackle it.
But again, I reluctantly bow to the overall culture of inclusivity/tolerance on iNat.
Given how often perfectly good photos are misidentified, I don’t see any merit in sketches. I’ve seen (professional) taxon experts misidentify photos which they can examine at their leisure. Why should I trust a sketch that someone might have made based on a brief glimpse of the organism in question? When I first joined iNat, I was a bit concerned about the fact that “sight” records were casual, but after seeing how often experienced observers make ID errors, I now view most sight records with a VERY jaundiced eye (with the exception of common species that have a reasonably distinct appearance).
I think “accuracy” was a poorly chosen name for this number. I think a better name is “uncertainty”, because that better reflects that fact that a large number is “bad”. Observations with large values are easy to filter out, so they don’t really bother me very much. I need to do some additional digging to determine if the “accuracy” really means anything, but if someone puts a massive accuracy number on their observation, it does make me question how much I can trust the actual lat/long. ( I see plenty of observations where I know, with a high degree of certainty, that the lat/long is off by several kilometers.) Again, it would be nice to have the cleaner data in iNat/GBIF, but I can’t fix the world.
It’s distressing to see “standards” viewed so negatively.
No. “supposedly” refers to “distorting abundance data” – often given as a justification for making duplicates casual. This would only be a valid argument if iNat data provided any kind of reasonable estimate of abundance.
I don’t like duplicates either, but the official iNat position is that they are not a big deal, and as long as there is no way to flag them they should be treated as any other observation. And yet there are some users who spend a great deal of energy fussing about duplicates.
That’s fine, but again, iNat policy is that they are allowed. I find that IDers are generally reluctant to confirm them, so in practice they are generally a non-issue and the regular efforts of some users to find any excuse to make sketches casual also seem like energy that could be better directed elsewhere.
And yet there are some users who do believe that observations with large uncertainty values should be casual – with “too large” being dependent on what they personally consider to be useful.
(There is a bug with at least one of the apps that sometimes results in observations ending up with a location radius of several thousand km; this is not carelessness on the part of the user or a sign that their data is unreliable, and in many cases they may not even be aware that it has happened. So communication with the user is what is most needed in such cases, not clicking a button to make the observation casual.)
I’m not viewing standards negatively. The purpose of my post – and of this feature request – was to point out that different people have different requirements, and that the well-intentioned practice of some users of enforcing their personal requirements by making non-defective (by iNat definitions) observations casual may not in fact be doing other people a favor.
@rcavasin I recognize that you have some strong opinions on what should and should not be Casual observations. But I fear that your may be over-generalizing certain categories of observations or the impact of such observations. For instance,
Clearly the quality of drawings from nature varies wildly, but I sense that you apply the term “sketch” in a demeaning manner to all hand-drawn evidence of organisms. That is unnecessary and it undervalues a large swath of well-crafted offerings. It has been reiterated many times on this forum that drawings from nature should be judged individually on their own merit. I have offered up any number of my old field sketches that seem worthy of inclusion in the database. Don’t “throw out the baby with the bath water.”
Assuming these are based on suitable observations, these do provide “evidence of an organism.” For anyone doing formal research on a given taxon or group, the onus is on them to put any such records in context (as I have done in some of my publications). To relegate them to “Casual” status lowers their visibility in searches for relevant occurrence data. In my way of thinking, “Casual” status is for blatantly inaccurate or missing data, not simply unusual occurrences.
There are two types here: (a) literal duplicate observations from one observer. Leave a brief note for the OP and move on; (b) multiple observations of the same organism, same time/place by different observers. Clearly the latter type is quite legitimate for each of those observers and it is simply a quirk in the database that must be accounted for when one is doing any formal analysis of distributional data (either geographically or chronological).
My own beef is actually with the term “Casual”; it is a gentle, politically-correct term for what generally constitutes “bad” or inappropriate data.
While thinking about this, and other related threads, it occurred to me that maybe I’ve been going about this all wrong. Maybe I shouldn’t be so hung up on ID and other information being correct and of good quality in iNat. To some extent, it’s like swimming against the current (not to say banging my head against a brick wall).
It occurs to me that instead of continuing to struggle with all the shortcomings of the iNat system/culture so that I can download ‘clean’ observation data to my database, a more efficient approach might be to not interact with observations on iNat at all. I could simply download the observations in whatever state they are in and then filter out probable duplicates, observations from nuisance observers, etc etc. Once I have skimmed the cream as it were, I can then look at the observations and where necessary, I can replace the iNat data with the correct IDs, annotations, etc. This would probably be faster than what I’m doing now, and would certainly be less frustrating. It would mean that there will likely be a lot more incorrect IDs on iNat (for my region and taxon of interest), but why should I lose any sleep over that?
As is the case with museum records including those in GBIF, researchers are already aware of pitfalls of erroneous identifications. Many of which relate to recent taxonomic changes based on recent (10 years or so) molecular research that impacts species names. This also applies to records such as bat calls (an in some cases Merlin for birds) that IDs may be incorrect based on the algorithms used (effectively AI) for suggesting identifications.
The problem extends much further. There’s also incorrect dates, incorrect location data, misattributions, etc. Taxonomic swaps are easy to manage. Splits, not so much (though those that are well defined geographically can probably be sorted out).
During a recent long, contentious form thread on accuracy circles (uncertainties) we found at least two ways that huge accuracy circles can be added accidentally to observations that actually have reasonably precise latilongs. I suspect most of these huge uncertainties are accidents. Therefore I tend to believe the latilongs unless there’s some additional good reason not to. You can choose to include or exclude such observations from your work, of course. The thing is, nobody should be excluding them from my work except me.
I am one of those fascinated by geographic outliers. They’re always a surprise and we don’t know what will happen with them. Consider two conspecific European grasses, Brachypodium distachyon and B. sylvaticum. Brachypodium distachyon turned up a few times in the early 1900’s in the Portland, Oregon, area and at least once further south in the state. It might have been poised to become yet another abundant weedy annual grass – but it didn’t. It disappeared. Trivial. Dots not needed on maps. (It has become common in California now and is headed north; we’ll see it again, but it was missing for decades.) In the 1960’s, B. syvaticum was found in the Corvallis, Oregon, area, an escape from cultivation. Interesting. Trivial. Ignored. However, in the mid-1990’s this grass was found to form extensive populations in some of our forests. It formed monocultures in some swales and was scattered in the closed-canopy forest where it seemed trivial – until one realized that one wasn’t seeing the native forest grasses that should have been scattered in those forests. Brachypodium sylvaticum had become a serious weed and by the time it was rediscovered it was too widespread and too abundant to be controlled.
My point is, we don’t know what will come of the geographic outliers. We don’t know which are “important” and which aren’t. We should be recording them, I think. Of course, that leads to a scattering of dots that have to be dealt with by researchers using the data. If they’re real, though, that’s data researchers should be dealing with. If they’re not, that’s an ID problem, not a reason to banish observations into Casual.
Then iNat should address these problems in their code.
Given your strong feelings, I would suggest submitting a feature request for this or working with someone to develop the code for iNat to use.
I’m not sure this is possible, given that they involve human error. But if the problem can be solved with code, I’m all for it.
That’s the rub. How do you know when they are real, and not the result of some kind of error? I would say that when I investigate temporal/geographic anomalies in various databases (observations which others have accepted at face value), I’ve found that most of them are the result of some kind of data entry/transcription error. Occasionally, it’s an ID error. The most common error is flipping month and day from a specimen label. That happens ALL THE TIME (because generally, they hire students to do the data entry and what do they care?). Another error that happens a lot with historical observations (where there is no lat/long - just a place name) is that back in the depths of time, somebody used an obscure/ambiguous place name that is easily mis-interpreted today. I run into this a lot, and even with the benefit of the internet (as well as a large database of previously georeferenced observations), it can be tough to track down where some observations were actually made. In other cases, the precise error is unknown. I have contacted institutions, asking for clarification about a specimen of species X collected on date Y by collector Z and they tell me they have no specimen that in any way resembles that combination of criteria. I can only conclude that something must have gone wrong somewhere along the way and mark the observation as bogus.
As I said, I rarely do anything that makes an observation go casual, though I’m often tempted. Mostly, I don’t because I don’t want to be bothered with potential blowback, and I have some way - however inconvenient - of keeping the observation in question out of my database. But I can certainly sympathize with those who do risk the blowback (much more than I sympathize with those who would criticize them as busybodies).
If you are genuinely concerned that there are observations that unjustly end up “casual” maybe we need a better way for people who are doing IDs to sort/filter/mark observations. Currently, we have a few tools that work well, augmented by a bunch of inconvenient kludges. Maybe iNat needs to provide better tools.
Ah, when you said “accidentally”, I thought perhaps you meant it was the result of some kind of coding error.
That said, it shouldn’t be that difficult for iNat to add a check on the accuracy value (which they really should rename) and if it’s greater than a certain value (20km? 100km?), generate some kind of popup that explains what this value signifies (cause the current name is easily misinterpreted), and asks the user if they are sure they want to enter the value.
For what it’s worth the location accuracy thread which inspired the current thread was here: https://forum.inaturalist.org/t/dont-let-an-observation-attain-research-grade-if-its-location-is-very-imprecise/2072
There is already a filter option for accuracy radius, and obs with a large radius (>~30km) aren’t shown on the Explore map.
Having communicated with a number of OPs who had posted observations with VERY large accuracy circles, it has invariably been a technical issue with the camera/phone, not a coding error or human choice at upload. While the last of those may happen occasionally, it is exceedingly rare in my experience.
Again, the accuracy can be (is) explicitly included in any iNat data download so even with a massive data download for any type or research, the ability to sort, identify, and delete any records with an unacceptable circle involves just a few mouse clicks in a spreadsheet. I did precisely that recently with a data dowload of some thousands of moth records from Central and South America. But that’s a choice for the user of the data, and shouldn’t be pre-emptively excluded by some preset limit in iNat’s code or the particular perspectives of some small set of iNatters.
Wow, that was a long/exhausting read. For what it’s worth, I would not advocate for making observations casual based on the lat/long accuracy (except maybe in extreme cases - like 100’s of km). I think it might be a good idea to rename this number to make it clearer (“uncertainty” makes more sense than “accuracy”), and it couldn’t hurt to implement some kind of warning when folks are in the process of entering a ridiculously large number. Those things seem like low hanging fruit.
And yes, I know about filtering observations based on the accuracy figure, and I use that function all the time (my default “identify” URL includes filtering on accuracy).
For the record, I don’t agree with some posters to that other thread that observations with accuracy > 100m are useless (paraphrasing). Personally, I don’t record GPS coordinates for individual observations except in cases of encounters with extremely rare species. Mostly, I’m just walking and putting tick marks on a pad with a pen, taking occasional photos (camera does not have integral GPS). I don’t report my own observations on any online platform, but my reports are generally in “checklist” format - a list of species with a count for each species (estimates for those that are particularly abundant). All observations are assigned a set of coordinates at the estimated centroid of the route I walked, with an uncertainty radius that encompasses the entire route. So my uncertainty figures are frequently in the 100’s of meters, if not several kilometers. All that said, I’ve got reasonably good spatial awareness, and if pressed, I can usually find where I made noteworthy observations on satellite view to a fair degree of precision. Yes, I realize that I can sync up my camera to a GPS, or use one of the suggested hacks, but frankly, I can’t be bothered.
So personally, I don’t have a problem with an uncertainty/accuracy figure of up to several kilometers. But over 10 km is starting to get sketchy, so I don’t ID/harvest observations with accuracy > 10km. I do look at those that have no accuracy at all, so I guess I could be accused of being inconsistent, depending on how you interpret observations that lack an accuracy figure vs ones with very large accuracy figures.
While our field methods differ somewhat, our perspectives are not too dissimilar. In a manuscript I have currently out for review, I had downloaded occurrence data for a set of moths and summarily rejected any observation with an accuracy radius of 5000m (10 km diameter) or greater. Due to local duplicates, etc., for mapping purposes I was also able to dispense with most data which had accuracy radii of >1000m. The resulting range maps were examined to make sure they did not disregard any major geographic segments.