Thoughts on Improving Transparency of DQA Votes

paul_dennehy · October 28, 2024, 5:33pm

I spent yesterday looking through hundreds of “Casual” observations in my taxa of interest, and now I’ve been thinking a lot about ways to improve the system by which observations become Casual Grade. I noticed that quite a lot of the “Casual” observations that are marked as wild and have a date and location are Casual due to DQA votes which seem to be questionable or downright wrong, and it made me wonder if there’s a way to prevent so many observations from getting “Casual-ed”. The top questionable DQA situations I found (at least for Lepidoptera in North America) were:
-Incorrect use of “evidence related to a single subject”, specifically downvoting of this DQA for observations with only a single photo, because the annotator was confused about which organism in the picture is the intended subject
-What I call the “make it casual by all means possible” voters- users who downvoted all the DQAs (correct location, correct date, evidence of organism, etc.) when in fact there was clearly an organism present and no reason to suspect the date or location was wrong, but the organism was captive. Rather than voting “captive”, these voters downvoted literally everything else besides “captive”, which I don’t understand at all. I suppose not a huge problem since these should be casual, but they should be showing up in “Captive” searches, not searches of date-less observations.
-Abuse of DQAs to make duplicates “Casual”, i.e. observations marked as “incorrect date, incorrect location, no evidence of organism” by a voter who comments “this appears to be a duplicate, delete it” (duplicates may be annoying, but this is an outright violation of the proper use of DQA votes)
-The “anti-back-of-camera-photo police”- lots of observations featuring a “photo of a photo” on the back of a camera automatically marked as “no evidence of organism, incorrect date, incorrect location” by the same handful of voters who appear to just hate this type of observation, despite several forum discussions concluding that they’re allowed. In the most flagrant violation of DQA rules, some photos with the date given for the observation visible on the camera screen were still voted “incorrect date”.
-Downvotes to “location is accurate” due to a large “accuracy bubble”. Accuracy vs. precision issue, I suppose- just because the location is wildly imprecise doesn’t mean it’s inaccurate.
-The “you seem like a cheater, so I’m downvoting all of your observations” instances- quite a few observations that seemed perfectly reasonable at the time and place claimed had seemingly inexplicable downvotes to “correct date”. Further investigation found that these were observations by relatively new users who had posted several observations that clearly were incorrectly dated, and the same user had commented on a few of these and then subsequently voted “incorrect date” on all of the user’s observations (usually only 10-15 per user, as these were newer users)
-DQA votes that should have been retracted, i.e. voters saying “please fix the observation”, downvoting the DQA, and then not coming back to change their vote when the observer commented “I fixed it”.
-Downvotes to “evidence of organism” that appears to be made in disagreement with an ID
-Mistaken votes to “Can the Community Taxon still be confirmed or improved?” after adding a disagreeing ID (adding a genus ID that disagrees with the original ID and kicks the CID back to family or higher, then saying Community Taxon cannot be improved because members of that genus are inseparable in photos… but the CID is currently not at the genus level yet, so the observation becomes casual at some high level)
I saw numerous cases of all these, and I’m curious on others’ thoughts on how to make these mistakes less likely (besides just “talk to everyone who makes a mistake individually”). Here are my thoughts:
-Require a comment to be added when voting on any DQA that could make an observation Casual. This would make it clear what the voter’s reasoning was, and make it easier to know whether to counter-vote. I picture a popup appearing when you vote “location is inaccurate”, for example, which requires the voter to type something like “this picture looks to be taken in the summer, not the winter” or “this user uploaded the same photo twice and placed it at a different location each time”. Downvoting a DQA to someone else’s observations that makes them casual should not be possible without some kind of justification.
-Make DQA votes generate notifications. (I know this is already a feature request, so… go vote for it I guess)
-Along this same line, specifically provide an automated message to observers when their observations become Casual that explains to them exactly how to fix the problem. For example, if a new user has someone mark “incorrect location”, auto-generate a message to the user explaining exactly how to change a location and counter the DQA vote.
-Gray out the “evidence related to a single subject” vote option for observations with only one single photo/recording. It’s impossible for an observation with one media item to have different media items related to different organisms, so why is this even a possible vote?
-Make an explanatory popup appear when voting on DQAs that can “Casual” an observation; for example “no evidence of organism” could generate a popup that clarifies “this vote indicates that you see no evidence of any organism at all” and require you to click “agree”.
-Provide some way to mark an observation as a duplicate. It doesn’t even need to change the grade of the observation, just be a way to mark these so that identifiers aren’t tempted to go nuclear in the DQAs to try to hide them.
I’m sure at least some of these have already been brought up as feature requests, so I figured I’d post in General to get some more thoughts on these issues before deciding on a Feature Request or two to formally suggest. Thanks for any input that anyone has on these problems and solutions!

raymie · October 28, 2024, 5:58pm

As someone who also frequently wades through casual observations, I can confirm all of these scenarios seem surprisingly common (and in no taxon more common than they appear in reptiles - yikes!).

schoenitz · October 28, 2024, 6:03pm

As of recently (some time earlier this year) it is no longer possible to make these votes. The ones you saw must have been entered before that change went into effect.

sedgequeen · October 28, 2024, 6:20pm

Wow. I’d heard of a few of those problems but am surprised people are finding so many ways to overenthusiastically making things casual.

Making it impossible to mark not “evidence of a single subject” when there’s only one photo seems a no-brainer and probably not even that hard to program. The others seem more of a challenge.

tristanmcknight · October 28, 2024, 6:40pm

I think I’ve done this sometimes. But I think the context was important-- while something like a honeybee could be plausibly anywhere and anytime, some of the others were clearly out of date / location and the whole batch was submitted within a few minutes, and I don’t think they ran outside to catch that one honeybee in between photographing a bunch of other pinned specimens. I’ll post a comment once or twice, but I figure somebody won’t want their notifications clogged with dozens of identical comments when what they really need to do is vet their whole account from their end (which I’ve explained in my comment).

rupertclayton · October 28, 2024, 6:51pm

Wow! Thanks for taking the time to categorize all those pretty unhelpful uses of the DQA flags. I think I’ve seen most of those types of misuse, but thankfully at a lower level than appears to be the case for Lepidoptera.

For this one, I really feel that iNat should change its policy to make duplicates acceptable. Two correctly identified instances of the same observation with accurate date and location info really do not constitute a problem. Almost always this happens because someone overlooked the fact that they had previously added the observation to iNat. I would prefer that iNat implemented automatic checks during upload (e.g. via matching checksums on the image/sound files) and used that to warn users (“It looks like you added this photo to an observation before. Are you sure you want to add it again?”)

As to other strategies to fix these issues, I’m not enthusiastic about the approach of requiring a comment when adding a DQA vote. I understand the reasoning, but the coding to associate the vote and the comment seems like it would be complex. Adding a pop-up with extra detail on the correct use might be a better way to approach this.

I really like this. In fact, I think there’s a big opportunity for iNat to add functionality across the site that helps new users understand data quality issues, conflicting IDs, etc. and guides them through fixing their observations.

DianaStuder · October 28, 2024, 6:54pm

It is irritating for the string of identifiers - seen that, done that. Duplicate …

When I use a DQA that pushes to Casual, I usually leave a comment too - since that generates a notification that I can respond to.

raymie · October 28, 2024, 6:56pm

“Evidence is of a single subject” is grayed out when there’s only a single photo, but this feature was only added recently and as such a lot of the observations with a single photo remain with “legacy” downvotes.

paul_dennehy · October 28, 2024, 7:01pm

Awesome! I hadn’t noticed this, so I’m glad to hear this change was already implemented.

paul_dennehy · October 28, 2024, 7:09pm

That’s understandable. I think as you say, context is important. If they’ve uploaded 15 different photos with the same date and location that are clearly from many different places and times, downvoting them all seems reasonable. I saw some cases like this, and I left the downvotes alone. There were some where the mass-downvoted observations had dates spanning several weeks though, and I’m not sure that downvoting them all was warranted in those cases.

Having a copy-pasted comment on each one saying “this was downvoted because it’s part of a mass upload of observations from the same user that all used the exact same time and location” would be helpful in these cases. I saw a few with no comments and no other moth observations from the observer, which just had an unexplained downvote for the date by some other user, despite the date given being perfectly plausible. Further investigation found that some of these were part of “mass uploads”, and others were not, but it took some time and searching to figure this out due to the lack of any commentary by the annotator on most of the observations.

arboretum_amy · October 28, 2024, 7:57pm

As far as I know, they are acceptable to iNaturalist the organization, just not to some users that feel strongly about it. I have been chastised by other identifiers for giving an ID on a duplicate.

mftasp · October 28, 2024, 8:28pm

in my experience there are many users that consider iNaturalist to not be a social network, and instead only an important scientific-data gathering tool.

Most of these users don’t seem to tolerate well any kind of reduction in the scientific value of observations. The response may range from rudeness to leaving false directions for the observer (such as “captive observations are not allowed on iNat!”, or “large uncertainty ranges in your location means this observation is useless” or leaving hundreds of terse messages asking a user to immediately unobscure their observations), or spamming any DQA flags to make observations casual.

As a counterpart, it’s worth keeping in mind is that there is a large body of duplicated observations made by duress users, with falsified data, and sometimes curators don’t have the time to comment on all of them.

As a typical but made up example: a teacher directs his class to create an account and produce 30 research-grade observations for their assessment. One student takes the trouble to do so, and two or three others use their photos with made-up times and dates. All of them then confirm each other’s (usually wrong) observations and make them research grade so that a single disagreement will be maverick. You now have 90 wrongly IDd, and 60 falsified, research grade observations that are not easy to deal with. In cases like this I flag one or two and spam wrong ID and location on all but the earliest-uploaded (original) observation of each set of multiples.

sedgequeen · October 28, 2024, 8:40pm

Understandable that you would do this. Frustrating to deal with! However, I think it’s not actually correct. Practical, maybe, but not correct. I wonder if there’s a better method, maybe involving the help desk and contact with the teacher. (Requiring RG observations is beyond stupid, if scientific usefulness of the observations is valued at all.)

oksanaetal · October 28, 2024, 8:44pm

I like this.

I don’t really understand why this was made in the first place.

david99 · October 28, 2024, 9:16pm

I agree that the DQA system needs an overhaul, but I think it needs to go along with an overhaul of what “casual” means. There’s a lot of different reasons that observations become casual, ranging from “this isn’t even a valid observation” to “this could be a valid observation, but needs something fixed” to “this is a valid observation, but can’t be identified”, and many things in-between. I think that an in-between designation would better fit some of the things that are made casual simply because they can’t be identified, if everything else is correct.

I especially like the idea of the DQA auto-generating a message (or comment?) on how to fix the issue, since it’s time consuming and frustrating to comment on new users observations, who may never come back and see the comment.

david99 · October 28, 2024, 9:31pm

Also, some of these may look like an abuse of the DQA, but aren’t. I sometimes mark things as date/location not accurate if I noticed that a user submitting things from the same exact time on different continents, and don’t leave a note if it’s a user that hasn’t been active for a while. It might look like a perfectly valid observation, unless you do some digging into the users other observations. Likewise with marking things captive, sometimes I mark species captive because I know they were planted (in one case I even planted it!), even though it’s within the natural range of the species and in the appropriate habitat. Ideally I’d leave a note on every case like that, but it’s a tradeoff between doing more ID’s and leaving notes for users that 95% (or higher?) of the time will never use iNat after their class ends.

jasonhernandez74 · October 28, 2024, 10:03pm

I’m sure at least some of these bad practices are done to compensate for rejected feature requests. In some threads, people actually admit to doing “off-label” DQA as a workaround for something they perceive as a problem.

Makes you wonder how they treat their grad students.

DianaStuder · October 28, 2024, 10:29pm

Lots of discussion on the forum. You have an obs with 3 pictures - a beetle, a beaver, a buffalo. Each should be in a separate obs. The ‘single subject’ DQA is the solution to that problem.

Also in the earlier discussions - almost every iNat obs includes various taxa - then the observer has to tell us what we are looking at. Or … the identifier decides - beetle or flower ?? Single subject DQA does not apply here. This is a separate problem, and someone else can fight for the solution this time.

tiwane · October 28, 2024, 10:35pm

As @arboretum_amy says, they currently are acceptable.

raymie · October 28, 2024, 10:45pm

Duplicate observations being unacceptable has to be the biggest misconception on all of iNat. I constantly see observations downvoted in every possible way because they are duplicates, and people mass flagging duplicates clogs up the flags page all the time, too. Perhaps something should be done to make it clear that this is not an issue?

Topic		Replies	Views
Is this an appropriate assessment? General	10	535	September 1, 2019
How to unmark "casual" from an observation General	4	212	August 4, 2024
Observation showing Casual but has no DQA votes Bug Reports id	4	980	October 29, 2021
Why is this casual? Bug Reports	7	276	June 24, 2024
Notifications for Data Quality Changes General	11	691	July 26, 2023

Thoughts on Improving Transparency of DQA Votes

Related topics