Switch upload order from species/date/place to place/date/species

pisum · March 4, 2023, 10:17pm

generally, i think this request may solve for one particular workflow, but i doubt that that workflow is necessarily the one users want to use in most cases where observations are uploaded via the web.

in the grand scheme of things, if the real underlying intent is to help guide inexperienced users away from just picking any random suggestion provided from the computer vision, are there better ways to achieve that goal? (for example, a bit of training would much seem to much more directly address what seems to be a training issue to me.)

i think this kind of thing would only make sense if you take it to the max, which means that you would allow a user to save different layouts for the “cards” on the upload screen. you could even provide a few different predefined layouts.

sullivanribbit · March 4, 2023, 11:15pm

This surprises me, assuming your phone has built-in GPS, which it does if it supports maps. I wonder if there’s a way to make it embed the location onto photos but it’s turned off?

sullivanribbit · March 4, 2023, 11:17pm

The nice thing about this suggestion is that it seems like it would improve some incoming observations, without hurting anything (other than muscle memory for some existing users, which would hopefully/probably be quite temporary). So even if it doesn’t solve a more general problem, it still seems like a simple change with real benefit.

janetwright · March 5, 2023, 1:46am

That’s interesting and I didn’t know about the computer vision demo, but I was really referring to people who take a photo with their phone, do the lookup on the spot, and then cancel out of iNaturalist once they see the ID.

JaneBP · March 5, 2023, 1:53am

That shows you how little entering a location doesn’t bother me - I haven’t even tried to find that out. LOL!

gcwarbler · March 5, 2023, 3:02am

I’m afraid I don’t understand this statement. For any observation to be uploaded or a taxon to be searched, the user inputs some identification, whether it is self-IDed or a concurrence with an iNat CV suggestion. It is quite true that general or tentative initial IDs will be refined by subsequent viewer IDs, and that is an outcome of the process, but to say that deciding on the taxon is the “output of that identification process” seems to shortchange the whole interface. I go back to my earlier comment: My uploads and/or research is almost always taxon-based; while I do add a location to observations at upload or refine a search to a specific location sometimes, those are qualifiers to my primary search, not the focus of my efforts.

DianaStuder · March 5, 2023, 12:35pm

That will apply to scientists who are narrowly taxon-focused. I think a minority on iNat.

Many identifiers use CV to quickly get to the ID they know.

Observers frequently upload obs as Unknown or ‘Life’ or broad IDs which are equivalent.

pisum · March 5, 2023, 2:15pm

it’s simply the wrong semantic flow, as others have noted. if you’re changing the flow for a minor use case, you’re forcing folks to change their flow for the majority of the cases in the hopes that users will make better taxon choices in a minority of cases. that seems wrong to me.

i could just as easily say that changing the order of these fields would make effectively no difference in the overall quality of the data within the system. (in some places with few observations, switching those fields could actually even make the computer vision suggestions worse.)

as i noted earlier, if the real goal is to help people get better taxon suggestions into the system, there are better ways of doing this that more directly address that particular problem.

sullivanribbit · March 5, 2023, 3:33pm

“Wrong semantic flow” is pretty abstract and seems subjective to me. I personally would not benefit from this feature, as I always have locations embedded in my photos before I upload them. But this change doesn’t seem like it would make anything worse for me.

I think the use case that this would improve is any entry of an observation that manually adds a location and uses computer vision in any way (including people who know what they have seen but find it easier to pick from a list than to type into an empty field). I’m not sure what percentage of website uploads this includes, but it seems like it could be significant.

It sounds like you are arguing that computer vision without location information might do better overall than computer vision with location information. If so, that’s a serious problem that really needs to be addressed. But that doesn’t sound right to me. Am I missing your point?

pisum · March 5, 2023, 4:12pm

in most cases, it probably would make no difference. in the rest of the cases, it seems like it could be better or worse. it just depends on the particular situation – the rest of the organisms in the training set, the organisms seen nearby, etc.

seems insignificant from my perspective because of my notes above.

bottom line is it’s not clear to me that:

there’s a problem that needs to be solved
even if there is a problem, this is right way to address the problem

vmoser · March 5, 2023, 5:10pm

Based on my experience mostly in Europe, it makes a difference in a significant amount of cases. The algorithm is (as the overall observations) skewed to North America, so here in Europe that effect might be more obvious than in North America. So yes, I think we can improve data quality and user experience with this.

2nd point, not discussed much yet: For me, place / species feels like a much more natural feel than species / place. Of course this is subjective, but an option to change the layout would improve the user experience for people like me. Place/Species is also how I sort my pictures or label specimens.

cthawley · March 5, 2023, 6:10pm

I have to confess, I don’t understand the arguments that this proposed change would make uploading observations more difficult or less efficient for many users. Learning to click slightly lower to enter a species ID doesn’t seem too difficult to me, and it doesn’t prevent users from focusing on a specific taxon. Users could still enter a species ID first for each observation if they wanted, and the species would still be easily visible on each card/tile. I’m not sure what I’m missing, so I’d be interested to hear any detailed explanations of how this change would be an impediment to use for others.

I don’t personally think the proposed change would have a strong affect on own my uploading process either way. I suppose I do tend to enter batches of observations that are all from one location, so setting the place at the beginning might be slightly more efficient for me.

The proposed benefit to the iNat community of nudging other users towards entry of place/location information does seem plausible to me for several reasons:

Location and date are two fields that cannot be corrected by identifiers. Identifiers can, however, essentially correct an erroneous initial ID by an observer though. Editing location/place data after initial upload is more complex than changing an ID for observers, so in this sense, getting the location and date correctly entered is quite important.
Location and date also provide key information to facilitate and improve identification, whether those are IDs from users or the CV. In the case of CV based IDs, I don’t find the arguments that entering location before using the CV will not improve accuracy to be compelling. This is based on the fact that adding location information to the CV model process was something that was widely requested by the community (see this feature request thread, though there are many others) and had broad-based support several years ago. There were many known cases where bad CV IDs were broadly suggested due to not taking location into account. Based on forum posts, the implementation of including location data in response to these requests has been quite successful and reduced the number of poor IDs. As such, I think it follows that encouraging a larger proportion of initial IDs to be made with location data could improve initial ID accuracy.
There are a reasonable number of observations uploaded without any location information at all. We don’t see these very often because of the default filters, but they do exist. I think it’s reasonable to hypothesize that the placement of the location field first might reduce these, increasing the proportion of verifiable observations.

DianaStuder · March 5, 2023, 6:27pm

Which is why identifiers who are working on obs out of USA have the extra work of convincing iNat, and newbies, no matter how pretty sure iNat is that it’s USA species
No, it’s not. Then adding location, in so far as CV updates now include our species - makes a useful difference to the suggestions.

I can see that change happening. Hammer thru Unknowns and Needs ID. CV update. Ta da. Now, CV suggests Local species

pisum · March 5, 2023, 7:03pm

i’m not opposed to having an option to change the layout (in the way that i’ve noted above), but your original request is to change the layout in a specific way for everyone. that’s the wrong way to go.

how do you measure this exactly? if i assume that research grade is the least bad measure of this for looking at the entire dataset, it looks to me like both European and South African observations have better research grade metrics than USA observations.

if you’re saying that bad initial computer vision suggestions have a significant effect on data quality, and that non-USA observations suffer from disproportionately bad CV suggestions, i don’t see it here. if you think there’s a better way to show a link (or lack of one) between computer vision suggestions and overall data quality, please enlighten me.

Europe:

South Africa:

USA:

vmoser · March 5, 2023, 9:29pm

I think this is a discussion and things can adapt to find a consensus in the community. I am open for constructive feedback and ideas from everyone

that is indeed not so simple, but a great idea to look at it with data. What does it mean to improve the data quality? Maybe I should say “effort for data quality”. If one has to override a suggestion from another continent it takes more identifiers than if the suggestion is more precise. Or worse, the ID process throws the observation back on order level, where it is left because the appropriate experts don’t check there.

Maybe it’s only 1 % now, but 1 % of “more useful” effort might already make a difference.

I am also thinking about the future here and what happens if we scale up in Europe and other regions: In Switzerland so far, most users are naturalists. My impression is these people do mostly fine and take common sense into the auto-suggestions. But more and more iNat is used in school and university, BioBlitzes and similar things where people seem less experienced.

pisum · March 5, 2023, 9:58pm

here’s what i’m looking at, and i don’t see any overall difference between European identification stats vs. USA identification stats:

if you have a better interpretation of the data or have ideas for other ways to look at the data, please share. based on what i see, my original take still stands. i don’t see a problem that needs to be solved here.

not sure what you’re suggesting here.

vmoser · March 6, 2023, 8:37am

I am suggesting the difference might

I am saying the effect might be very small (like 1 %), but even to be more efficient in this small area is a win with the huge amount of data processed in iNaturalist.

A better indication of this question is probably understanding the orgin of these 0.5/0.6 Maverick. Seems small, but each maverick took a disproportional effort to overturn. Why is it maverick in the first place?

and there comes this in. I think Maverick in Europe = proportionately more wrong suggestions by auto-ID and less by “inexperienced, don’t care users (like school kids)”

DianaStuder · March 6, 2023, 12:49pm

This is just one anecdotal example. I wonder - if we had a project for species which are Pending for CV. And interested observers were encouraged to ‘add your obs of this for CV’ Then you could evaluate Pending vs Included with each CV update. (Maybe iNat already does, and I’ve missed or forgotten?)

https://www.inaturalist.org/taxa/589558-Mairia-coriacea
For CNC or GSB I ID Unknowns for the Western Cape. At first CV and I were mystified by this one. Neither of us have seen it in real life. But so many obs flowing in, and after some informed IDs, I learnt to recognise it.

Now when that comes up CV no longer says Dunno, it is Included. Geographic place matters. Especially with an influx of newbies for CNC and GSB.

Better onboarding that politely insists on a location first please, might work. (And always with the - Don’t show me this again - option for ‘qualified oldies’)

PS Maverick at 0.5% But Maverick status on iNat is too little too late (purely cosmetic and not functional). It has already achieved Research Grade despite the Maverick ID. If iNat would make the Pre-Mavericks visible, I wonder what percentage that is? Those obs are the ones where tidying up the single wrong one, does make a difference!

rupertclayton · March 6, 2023, 7:15pm

Let me try again. To be clear, I’m not talking about the process of determining a community ID. I’m talking about the process the observer goes through to decide what ID suggestion to select when they’re about to upload an observation.

When iNat users upload photos, they start with varying levels of knowledge about what they’ve seen. Some may be fairly confident about the species or genus, but most know very little. In any case, the point at which a user adds a suggested ID is after the point at which iNat offers them computer vision suggestions. You say…

That’s actually not true when the user is presented with CV suggestions in the upload interface. At this point they generally have not entered or selected any identification.

In both the web and app interfaces, iNat will use its computer vision model to suggest possible identifications for an observation being uploaded. The input for the CV process is an image, date/time and location. The date and location are used to fine tune the list of candidate identifications returned by the raw CV lookup.

Specific to the web interface, once an iNat user has uploaded a photo, they are currently presented with the fields “Species name”, “Date” and “Location”. If iNat has extracted useful metadata, “Date” and “Location” may be pre-filled, but otherwise they’re blank. The natural UX flow is for a user to click into the “Species name” field, which appears first, but if the date or location is blank they likely will receive CV suggestions that are a poor match for their photo.

If iNat moved the “Species name” field below the “Location” and “Date” fields, we could expect that some portion of users would add this data before clicking into “Species name”. This would result in more accurate ID choices by the observer, and also would likely reduce the number of observations uploaded with missing date and location info.

Good for you, and while you’re probably describing the process of more experienced observers, that’s evidently not the process for the large number of less-experienced observers who upload a photo, fail to notice whether date/location were extracted from the metadata, and proceed to choose something vaguely similar from the list of choices displayed.

What I fail to understand is why changing the field order would cause any major problems for more experienced observers such as you and (maybe) me.

rupertclayton · March 6, 2023, 7:36pm

I think the least bad way to measure the benefit of the proposed change would be the A/B test proposed by @ianmanning:

Create a version of the upload form that switches the field order.
For the period of the test, present the new and old upload forms to ~50% of users each. It’s probably best not to assign the form randomly each time as the ever-changing field order might be confusing. Instead, the choice could be based on odd vs. even user ID numbers, with each user consistently receiving the same version for the duration of the test.
Save the observations along with a system-generated field to record which version of the form was used.
Measure the results at a few points (submission, 7 days, 30 days?). Measurements could include:
a.Percentage of observations with initial IDs matching or a parent of the subsequent community taxon (probably limited to observations that received at least one further ID).
b.Percentage of observations with missing date/time info.
c.Percentage of observations with missing location info.

Topic		Replies	Views
Auto ID without Location General	3	384	May 29, 2020
First observation on iNat of a certain species General	15	1050	November 15, 2023
Potential AI Photo Order Bias General	15	804	May 4, 2021
Better use of location in Computer Vision suggestions Feature Requests	56	7945	April 13, 2021
Change order of species in search results General question	8	851	December 26, 2020

Switch upload order from species/date/place to place/date/species

Related topics