Switch upload order from species/date/place to place/date/species

exceptionally small. honestly, i’m not sure why anyone would or should bother to analyze this kind of difference.

you could hypothesize that the difference here comes from bad computer vision suggestions, but if you look at the bigger picture, it seems to me like a better hypothesis would be simply that European identifiers are identifying proportionally more observations than their USA peers (as evidenced by the higher research grade ratio for Europe), meaning that it would not be unexpected that they would surface slightly more Mavericks in the Needs ID pile.

@dianastuder talked about “Pre-Mavericks”, and if there were more “Pre-Mavericks” in the European set, you would expect the European “Leading” IDs to be proportionally higher than the those in the USA set of Needs ID observations, but the USA “Leading” IDs are actually proportionally higher. this would seem to contradict or at least not support your broader hypotheses that European CV suggestions are disproportionately worse, leading to lower data quality in Europe.

i’m not going to try to dig into either of these hypotheses, since i don’t think it’s worth my effort. but you’re welcome to do your own analysis.

here i agree with your first point in that it doesn’t look to me like there’s currently a problem in Europe (which is why i don’t understand why you think a “solution” needs to be implemented for a problem that doesn’t exist).

on your second point, this strikes me as borderline fear mongering that new arrivals will mess things up for the rest of us. from my perspective, to the extent that new folks make newbie mistakes is to be expected, but given proper onboarding, any negative impact should be minimal and temporary.

but if you dig into this and just think about how it would practically go down, you would realize that most students and new bioblitzers would be using iNat app on their mobile devices to record observations not the web uploader, and most people using the iNat app would get locations automatically captured via the iNat app or the mobile device camera app.

i don’t know why anyone would waste effort doing an A/B test on this kind of proposed change. just looking at the easily available metrics, you can already see that a different flow for the web uploader to try to improve computer vision suggestions is likely going to have near zero impact on overall data quality.

if someone somehow can rationalize the change some other way, that’s fine, but just implement the change. don’t confuse things by making some folks have one flow while others get another, or worse, by presenting one flow some of the time and another flow the rest of the time. any potential insights from such a test are likely going to be very minimal – not worth the effort.

i understand why folks might think that bad computer vision suggestions can have a negative impact on data quality. if you’ve identified a reasonable number of observations, you’ll have come across observations where it looks like folks just took whatever the computer vision suggested, without really thinking about it. these examples stick out in our memories, but if you really think about all the observations you identify, you’ll realize that the vast majority of identifications that are labeled as computer vision assisted are just fine.

1 Like

Well, the reason to do an A/B test is that it would directly test the benefit of the proposed change, which has a pretty plausible rationale and for which the counter arguments (“I’m a specialist and I prefer to have the species field first”) seem very flimsy. In contrast the “easily available metrics” that you suggest as an alternative are influenced by a huge variety of factors that likely have much greater effect than the small degradation of ID quality that @vmoser (and I) believe this UI design is causing.

I appreciate that the data comparing observations in Europe and South Africa with those in the United States do not show a particular bias towards poorer quality IDs in those areas (and in fact show the reverse). But there are so many factors that might account for differential ID quality that using that as a way to measure the likely benefit of a UI improvement seems sure to swamp the data in the noise. The A/B test would be a direct measurement.

Based on his own experience, Valentin provided a great example (using a photo of a European beaver) of how the current design reduces the quality of CV suggestions. Others mentioned their experience with poor suggestions for South African taxa. I think we should be wary of reading too much into those examples, though. Just because many of us may have experienced geographically implausible CV suggestions when CV lacked location input does not imply that these poor suggestions are going to show up in the overall iNat dataset as lower ID quality in any particular region compared to any other. The same CV process suggests European and South African species for North American natives lacking location data.

And there are significant differences between those regions when considering income and education levels, the observer and identifier communities, population density (both for humans and all the stuff they observe), prevalence of iNat in education and much more.

Personally, I’ve had many experiences where the location and/or date of my photo wasn’t captured in image metadata and consequently the CV suggestions were misleading. (For clarity, those were generally using the Android app, which is the not the subject of this feature request, but the underlying issue is the same.)

As an identifier, I mostly work with plant observations in western North America and Latin America. In both contexts, I encounter a small but steady flow of observations with missing locations and dates. I also encounter observations that are identified (using CV) as species that don’t generally occur anywhere near the observer’s location (once we’re able to establish where that was). This problem happens everywhere, and I don’t expect it will have an easily detectable signature in iNat data. For that reason, I can’t think of any analysis of current iNat data that would tell us whether adjusting the sequence of these fields would improve data quality.

Unless I have overlooked it, the objections still appear to be limited to:

  1. I’m worried this might be a little inconvenient for my personal workflow as I might have to click in a different part of the screen.
  2. Analyzing data of large numbers of observations created by all sorts of users using various interfaces does not show a clear trend that we could ascribe to this admittedly minor UI issue within the web upload form.

BTW, I absolutely agree that most CV-assisted identifications are fine. But some are not, and given the high volume of iNat observations it’s reasonable to explore UX tweaks that might improve the “hit rate”. In this case the dev change is very small (reordering three fields). And even the test is not complex (use the A or B upload form for a test group for a defined period; measure a few metrics over the following month).

Edited to add: I’m also fine with making this change without an A/B test if this is seen as causing needless confusion. I’m just saying that I don’t think there’s any analysis of existing data that we can usefully substitute for the A/B test.

5 Likes

I think changing the order could improve the quality of CV suggestions - by how much, I don’t know. Small changes can influence which fields are entered first.

My concerns are of consistency (the mobile app shows taxon/date/location, and I think it and the web and mobile apps should match, personally) and, well, aesthetics. I think it doesn’t look great to see the taxon below the other two fields. A minor quibble, but it’s one that I have nonetheless.

I’d prefer doing something like allowing text entry in the species field without a date and location, but asking the user to add a date and location before automaticaly showing a CV guess, or making it more explicit that date and location are important for CV suggestions, or something along those lines.

6 Likes

i’d prefer this to the original proposal. a simple prompt at the top of the computer vision results saying that results could be refined by inputting a date and location seems like it would address the main concern expressed in the thread.

2 Likes

Tangent.
Is it possible to find how many obs have 1, or 2, or 3 disagreeing IDs?

That more than two thirds rule racks up fast.
1 wrong needs 3 to convince - makes sense.
2 wrong need 5 - why not stay with 3 (especially if the second ID is simply NotAPelargonium - that should = a single disagreement)
3 wrong need 7. Such a pointless waste of IDs, that could have cleared 2 more!

One super-simple change would be to simply not invoke CV in the species name field if there is no location set. I agree that it would be better to somehow make it clear that the location should be set first, but doing that involves UI design, and doing this simple change would not, which might be advantageous in the short run. Currently I believe the default behavior is to only show nearby species when there is a location, so this would be a logical extension of that. (It’s only showing nearby species, but there are no known nearby species because the location isn’t known.)

6 Likes

We discussed this proposal as a team and while we recognize the issue (CV suggestions are much better when location is entered, so entering a location before choosing a suggestion should be encouraged), this is not the way we would like to address it, so I’m setting this request to close automatically tomorrow. We’ll talk with our designer about improvements.

4 Likes

Thank you @tiwane and all for the discussion :)

To me, the main issue isn’t whether this change would lead to a greater proportion of RG Community IDs (it might, but IDs will work themselves out in time), but the amount of work by identifiers required for IDs to reach RG. For example, an erroneous initial CV ID may require 3 identifiers to overturn it and potentially additional identifier time to write comments, tag users, etc. A correct initial CV ID only requires one agreeing identifier to get it to RG (essentially a third of the work/time). If having location data entered improves the performance of CV suggestions, it has a potential to reduce identifier time and frustration. This could also lead to a better experience across the platform for observers as time to identification could decrease. So, to me, the main question is:

Does having location data entered increase CV prediction accuracy, and, if so, how and by how much?

I decided I’d take a shot at testing this empirically. I gathered a taxonomically diverse sample of observations from the United States chosen via stratified random sampling. For each observation, I downloaded the first pic and entered it into the uploaded it to the CV Demo Page twice: first without the location, then with. I recorded several indicators of prediction quality, including:

  1. The rank of the suggestion (if any) that matched the RG (presumed correct) ID

  2. Whether the CV was confident in the correct genus or not

  3. The number of suggestions provided by the model

  4. Whether the CV was “not confident enough to make a recommendation

For further details on methods

Summary statistics:

% 1st Choice Correct % Confident in Correct Genus Average # of Suggestions % Not Confident
No Location 60 54 9.02 40
With Location 71 68 6.69 28

For raw data and additional summary stats:

Results:

For each metric, the CV performed better when location was included. The first suggestion of the CV was correct 11% more with location included. To delve a little deeper, for observations where the CV options when location was not included contained the correct choice but it was not the top choice, the average rank of the correct choice was 3.4 without location but 1.6 when location was included. For observations where the CV without location did not give a correct option at all (n=22), the CV with the location offered the correct option in the set for 15 of these with an average rank of 6.5.

The CV was confident in the correct genus 14% more often when location was included. The CV offered 2.5 fewer suggestions when location was included. This represents higher certainty. Additionally, offering fewer choices should increase the likelihood that a user will choose the correct one (when the option set contains the correct ID at least). Lastly the model was “not confident” 12% less when location was included (ie, it was more confident).

Lastly for no individual observation did including the location reduce the accuracy of the CV output.

Overall, I think these results show that including the location increases the general accuracy/performance of the CV model 10-15% depending on which metric one looks at, and it has essentially no downsides. This is for observations from the United States where anecdotally we might expect the positive impact of including location to be less. In other areas, the benefits of including location may be greater. If you told any engineer that they could improve a process 10-15% by making a reasonably small change, they’d be pretty thrilled. So I think that considering ideas that encourage users to enter an observation’s location before getting a prediction from the CV are a great target for improving identification efficiency on iNat.

If you made it to the end, thanks for reading. Tagging some folks who were specifically involved in discussion of testing this idea with data earlier in the thread: @tiwane @pisum @rupertclayton @vmoser

10 Likes

Wrong CV-suggestions was a problem for data in the Netherlands in 2018, but I think the early adapters have solved most issues with it…

Thanks very much for doing that testing work. I’m encouraged that the results we would intuitively expect are borne out.

@tiwane: Would it be possible to get a response from iNat staff on whether the proposal and @cthawley’s testing are compelling enough to justify a change to the upload form?

3 Likes

ideally, you would want to consider solutions in the context of the problem that you’re trying to solve and alternative solutions, too.

let’s suppose that people are blindly agreeing to the 1st CV suggestion, and that by implementing this particular proposal, you can improve rate of correct 1st suggestions to 71% from 60%. so if people are still blindly agreeing to the 1st CV suggestion, you’ve decreased bad user IDs to 29% from 40%.

did you solve the right problem? is that the best you can do to solve the problem?

from my perspective, the real problem is not that CV suggestions aren’t accurate, but that users blindly adopt the bad CV IDs as their own IDs. so if that’s the real problem, the best solution with minimal technical change here is to educate folks that they should not just take a CV ID as their own IDs without doing some due diligence, that there’s no problem with identifying something to a very high level, etc.

if you wanted a more engineered version of this, this could be implemented as something like a setting and/or switch that could turn CV on/off. during user onboarding, a user might be be forced to read and confirm an explanation about how to properly use CV before toggling it on (or something like that).

suppose that this kind of education could be 50% effective at preventing folks from blindly agreeing with the CV suggestions. well, then in the case where the CV first suggestion is only 60% correct and everyone is just choosing that blindly – meaning the bad user ID rate is 40% – then 50% effective education would potentially reduce the bad user ID rate to 20%.

maybe the education includes suggestions that users can add date and time before requesting a CV suggestion to refine the suggestions. in that case, you could reduce the bad ID rate even more.

personally, i think if we’re going to go to the trouble of changing UI to solve for suboptimal CV suggestions (as opposed to trying to solve for bad user IDs specifically), it should be done in a way that works for the possibility of refining suggestions based on not just location and date, but also a user’s suggested ID. i believe that on the observation screen, the CV suggestions are limited to the iconic taxon of the observation taxon. so it would be nice in the upload screen for the user to be able to specify a base iconic taxon or just a regular taxon from which to limit CV suggestions.

1 Like

= add your location first.

Staying with identifiers - I battle with scaling up the more than 2/3 rule. Thanks to @tonyrebelo for the percentages

1 needs 3 = (3/4 = 75%)
2 needs 5 = 5/7 = 71%
3 needs 7 = 7/10 = 70&
4 needs 9 = 9/13 =69%
5 needs 11 = 11/16 = 68%

Lets take the second line
Wrong species, no it’s not that, needs FIVE identifiers - where 3 would be adequate to overturn the wrong species. No it’s not shouldn’t force us to find another pair of identifiers - or to gently hammer the first 2 who are making life difficult without ever intending to.

When there are 3, who are ignoring their notifications - not sure there is any point in wasting time on them. Especially if there are already informed IDs with kind comments explaining.

The bulk of uploading observers will be new, with good intentions, (or passing thru?)
We could have a (slightly hidden) personal setting to arrange our preferred sequence?

1 Like

I guess I would say that, for any given situation/problem, one can almost always propose a range of improvements, many of which would lead to different levels of “solution”. We can almost always come up with ideas for bigger improvements that will lead to bigger solutions, but also take more work. But, when designing, one of the big questions is also: do we have the resources to make these changes? If the answer is no, then making the smaller change (even though it leads to a smaller benefit) that can be implemented with less effort is often better than making no change at all. In short, don’t let great be the enemy of good.

So I agree that better onboarding for iNat is a great goal. It’s not a new idea, and it’s one I have supported on other threads. But it’s an idea that has been discussed for at least several years (as far as I’m aware) and hasn’t been implemented, I’m guessing because it would require a fair amount of time/work that isn’t available.

Given the constraints of time/work demands to implementing changes, any nudges to this leverage point (adding location data before offering CV predictions), seem like they could offer a good return on investment of effort in terms of improved accuracy/experience. I don’t feel super strongly about this particular request (though I personally think it might be worth trying), but I think that as a general target, finding ways to push users to add location data is probably worth pursuing.

7 Likes

This topic was automatically closed after 17 hours. New replies are no longer allowed.

As I said above, we’re investigating other ways to encourage users to add location data before choosing a CV suggestion, we just didn’t want go with this specific approach of switching the order of fields.

3 Likes