Calculating identification accuracy

Calculating and thinking about your identification accuracy is useful for improving your IDs. There may be methods to calculate ID accuracy starting from the date you joined iNat, but that may be unfeasible if requiring manually checking thousands of IDs. The current method calculates accuracy (correct ID percent) from a selected start date onward, and so becomes increasingly informative over time since the starting date. For example, you could start keeping track on 2/20, when your total IDs were 66000, and then the next day you may increase to 66984. Keep a running tally of all errors made after the start date (but observation dates are irrelevant), which you can find in your disagreeing ID-notifications. Misidentifications/errors are incorrect IDs, regardless of whether they’re withdrawn, you added IDs after them, or whether there are multiple errors on the same observation (each should be counted). Subtract the start date-ID total from the current ID total. Then divide the new ID total by the error total and multiply by 100 to get error the percent, and subtract the error percent from 100% to get the accuracy percent. Feel free to discuss or test this method or share others. It would also be easiest to calculate accuracy stats automatically in iNat. if ever possible.

3 Likes

How are you defining accuracy or an “incorrect ID”? Seems like you could have an honor system for your own reference but I’m not sure how to automate this.

It may be difficult or impossible to fully automate, given that the website doesn’t know which IDs are correct or incorrect, regardless of whether they’re withdrawn. By automate I also partly mean any faster way to calculate using the method I described, like Excel formulas. Has anyone tried to calculate their ID accuracy before in general?

1 Like
  1. Why did you label this a “tutorial”?
  2. See: Goodhart’s law. (i.e. If one tries to make a measurement out of something, it becomes useless because individuals modify their behaviour to optimize for the measurement, often with negative consequences for other outcomes.)
3 Likes

I labeled it tutorial because it’s a set of instructions, although a very simple one. I’ll move it back to General now if that would seem to fit better. Goodhart’s law is interesting but doesn’t seem to contradict the usefulness of calculating accuracy. It would actually be a good thing if everyone tried to improve their accuracy, which may mean increase their ID cautiousness, for example by making more coarse-rank IDs. This is something many high ID-volume identifiers think about. I also wasn’t bragging of high accuracy just in case you were wondering. I’m unsure what mine is, maybe it’s not as high as I’d like. The goal wasn’t to make a comparison like an ID score, just to know your own stats, and only for those who want to. Also, what level of accuracy is considered “good” is open to interpretation, not defined. I’m imaging something between 80%-100% would be good, and 90% or higher would be very good. I think it will take me a few weeks to get a sense of what mine is anyway since I just started keeping track yesterday.

My much softer way of learning when I have been too confident with my IDs is to follow my notifications.
Where I used to say that is a slugeater, I have learnt to leave it at snakes - because there are 2 I cannot tell apart.

3 Likes

I am becoming a high-volume identifier (I set a goal of 60 a day for this year) and I’d say I try to strike a balance between two goals: being absolutely 100% sure of an ID before I give it, and trying to give the observer feedback/move the observation to Research Grade/chip away at the Needs ID pile. In other words, I move fast but try not to break too much. I also strongly prefer to ID observations that are not at Research Grade yet.

I also try to keep in mind the relative importance of each observation. If I screw up the ID of something common and widespread in my area, my one mis-identification will not matter substantially among the thousands or tens of thousands of observations of that species. For something rare, I am much more cautious and often withhold an ID in favor of calling in someone I know to be an expert.

Do I make mistakes? Absolutely. I haven’t calculated my rate of accuracy, but my sense is that it’s pretty good, probably well above 80%. My accuracy rate for my own observations is quite definitely lower than my rate for IDing other people’s observations, because I make observations of organisms I’m still trying to learn.

I say all this because I wanted to point out that not all high-volume IDers are cautious and default to coarser level IDs. I do think identifiers should try very hard to be accurate and to be mindful of the gaps in their knowledge.

2 Likes

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.