What should represent a 'range' for a species in iNaturalist

Having read the 2019 roadmap put out by Tony and seeing Geographic intelligence in the computer vision, I wanted to throw out for discussion this question.

It is not stated (and may not be decided) how the geographic intelligence will determine if a species is found somewhere (other iNat sightings, checklists, atlases, range maps etc are all possibilities), but to me it has never been clear what iNat considers to be a ‘range’ and how they should be tracked.

It may be more an issue for birds than anything else, but as an example here are a series of birds I have personally observed in Ontario. Most have been seen only a handful of times in the province, some only once ever : Calliope Hummingbird, Vermillion Flycatcher, Black-bellied Whistling-duck, Swallow-tailed Kite, Reddish Egret, Common Ringed Plover, Crested Caracara, Slaty-backed Gull

No serious ornithologist or naturalist would say any of these species have a ‘range’ that includes Ontario. Then you have very uncommon species but that are not mega rarities like for instance Black-throated Grey Warbler in Ontario.

The distribution map of sightings, and the Ontario checklist will show all these species. But what about the range map or an atlas?

Is the range of a species:

  • where it is regular and expected
  • where it is at least not a mega rarity
  • any place it has ever been seen
  • its natural range or its human impacted range for invasive species
  • something else

How this is defined and tracked can have serious implications for how any geography validation takes place in the proposed changes to the computer vision

9 Likes

yeah… there are grey areas and such,but right now the algorithm is identifying things wayyyy outside where they could ever occur. I think it will have to be based on proximity not ‘range’ (unless something else about how iNat works changes a lot), and it won’t be perfect but it will help a LOT. Most of the algorithm issues come down to people without lots of knowledge choosing things they should not. This is much more an issue than the occasional vagrant bird a power user finds… how often would you be using the algorithm to identify something like that? I am thinking not much at all.

3 Likes

Looks like I have the ability to restore your post. Would you like me to?

@ahospers I restored it for you. ^^^

2 Likes

Charlie, I don’t think the risk is in having to use it to identify the rare species, it is in having the rare species shown as an option to choose from.

I will use the example of the species I listed as really rare, but not a mega rarity, the Black-throated Grey Warbler. This is a species seen maybe once a year in Ontario.The odds of an inexperienced iNat user independently finding one are very low.

However, the odds of that same inexperienced user finding a Blackpoll Warbler or a Black-and-white Warbler, two common birds in the province that look virtually identical are quite good.

The computer vision visual match for the 3 species will be very high.

The root of the question was depending on how you define the range of Black-throated Grey Warbler (is a bird seen less than once a year in range or not?, should the atlas have the area or not), or tool to select proximity for the computer vision (checklist, nearby records etc) will this species appear in the options presented by the vision tool?

2 Likes

I’m not sure my grasp of the question is 100% accurate, but I’ve pondered this or a related topic a fair amount. My thoughts are in relation to how the “computer vision” (AI, auto identifier, not sure what the official name is) takes geographic location into consideration for its suggested ID and how the results of the analysis are displayed. And, perhaps this is all explained somewhere (?). My thought is that when the AI analyzes a photo (does it do the whole set of photos in an observation or only one photo?), it should have independent values for (1) how well a photo matches a particular species, and (2) a measure of commonness or likelihood. I am not versed in all the metrics used in an AI/computer vision similarity assessment, how they are weighted, etc., so can’t comment on that. When it comes to a measure of commonness or likelihood, this could be based on one or both of these things: (a) distance to a human-defined range (think range map in a field guide), and/or (b) a measure of reporting frequency within radius bands. My ideal would be that each suggested species includes–perhaps right after it–a measure for the computer vision match AND a measure for the range likelihood, perhaps 0-100 for each. So for example, in Chris’s example of a Black-throated Gray Warbler in Ontario for a particular date, the computer suggestion might be something like this:
Black-throated Gray Warbler 95/5 (meaning high score of visual match and low score of frequency of reports of that species for this location and/or time).
Black-and-White Warbler 85/76 (meaning moderately high scores for both visual and time/location)

I would like to see something like that when the computer vision gives suggestions. Should they be ordered by the highest visual analysis scroe or the geography/time likelihood? I’m thinking sort by highest visual match, but then the likelihood or frequency measure gives some caution to the observer.

If someone has a link to some of the details on how this currently works, I’d be interested.

4 Likes

That’s effectively what it does now, right down to the scoring. The geo match handled by the seen nearby indicator.

Right now the rank display is solely visual match driven though, the geo match is solely for info, so things with a 0 score on geography still get suggested even as the first option.

EDIT - to answer your other question, it only reviews the 1st photo in an observation.

3 Likes

Chris, I’m thinking to be most effective for use in informing Computer Vision rankings, it should be proximity to a combination of your #1 and #4 – where it is regular and expected including invaded areas.

The question then becomes, how do we define that range for the vast number of taxa in iNaturalist that have enough high-grade observations for Computer Vision to include (or that soon will)? I think the only practical solution to that is going to have to be driven by the geographic data already in iNaturalist for those same high-grade observations – even though that leads to some circularity. That doesn’t bother me too much because it is only a tool to suggest IDs, and should still deal with things way beyond previously reported ranges pretty well.

In cases where atlases or ranges already exist for taxa in iNaturalist, those should certainly be considered as part of the “geographic data already in iNaturalist” and probably be given higher weight it determining “range proximity” to help with the circularity issue above – although in the case of atlases in particular, they can potentially be either incomplete or overly broad in defining a taxon range (or both!).

My final thought has to do with all the other potential taxa that don’t yet have enough high-grade observations for Computer Vision to grab. I would like to see any implementation of geography-informed Computer Vision include, via a link next to each listed suggestion, any other taxa in the same genus (family?) with too few observations to be included in Computer Vision, but with range proximities at least as close as the suggestion being shown. This would (1) remind the user that Computer Vision is not an infallible or complete list possibilities, (2) give the user additional tools to actually find the right ID.

Sorry if this last is veering too far into the weeds of implementation, but I think how we want to use taxon ranges needs to guide how we want to define taxon ranges in iNaturalist.

2 Likes

Housekeeping FYI for all, related discussion going on here.

A post was merged into an existing topic: Species Suggestions for the Wrong Continent

It may be unlikely that an inexperienced user will find rare animals, but it will happen. I saw an Ocola Skipper in rare my second year of doing butterfly checklists. Rick Cavasin says on his web site that he has never seen one in Ontario. I saw a Black Dash my third year and Max Larrivee emailed me to congratulate me as he has never seen one. I certainly wasn’t looking for either of them since I had never heard of them until I saw them. I think most new users make range mistakes - I know I did, but if I had only been using iNat when I saw these species, and the range had been set to eliminate them (especially the Ocola Skipper), It would have been frustrating to ID them and perhaps off putting for me to continue using the platform. I don’t have an answer to the range problem other then the very nice and patient people who helped me understand how the site worked and pointed out my mistakes kindly.

4 Likes

I’ve had similar experiences with a couple insects, though I had ID’d them just to genus or order and more knowledgeable users later revealed them to be exciting finds. But with the question here mainly being in regard to refining the computer vision suggestions, I’m not sure the changes proposed here would have changed your experience. You IDed those butterflies, right, not the iNat suggester? I don’t think anyone would suggest that users should be prevented from manually IDing an observation outside of a given set range. The solutions suggested here wouldn’t eliminate those species from iNat suggester consideration outside of their designated range, just consider proximity of known range more heavily in ranking species suggestions.

5 Likes

Actually, they were identified on e-Butterfly. If I wasn’t using that platform and trying to ID them only on iNaturalist, it would have been difficult for me if they were shown as out of range, especially when I was just starting to use iNaturalist. I made range mistakes a lot at the beginning so wouldn’t want to make that mistake again. Likely I would have ignored any species shown as out of range.

The range shouldn’t include IDs that are not community-based.

I just spent quite a bit of time correcting the reports of Scotophaeus blackwalli (aka “Mouse Spider”) reported on the map given for the species.

A few frustrated me by remaining on the map despite my having corrected them. A friend explained that some of these have opted out of community ID.

If they’ve opted out of community ID, they can’t be trusted, unless I specifically know the identifier and trust that person’s IDs. So I don’t want to see them on the map.

I also don’t see a way to exclude these opt-out observations from the filtered search. Am I missing something? (But it wouldn’t suffice just to provide a checkbox for getting rid of them – they really shouldn’t distract me on the main species page map.)

1 Like

if they don’t meet the criteria for research grade, they will display as not research grade, so you can filter them that way but you may also lose evidence-less observations you might want.

1 Like

Oh right. But my purpose is to correct people’s IDs and improve the map. I want those that are not research grade, but not those I can’t help correct.

1 Like

Two things - the percentage of people who opt out is really quite low, so they are a relatively infrequent find. Secondly, just because someone has opted out does not mean you can’t help correct an ID of theirs. If the user is active, and sees your ID, they may choose to update their identifications. It is really only the cases where the user has opted out AND is inactive that are beyond help.

Also, if/when their incorrect identification is overwhelmingly outvoted by the community (becomes “maverick”), their observation then disappears from the range map. It’s not an ideal situation, but it’s how it currently works in the system as it’s set up, with “observation IDs” (what’s at the top of the page and what the search queries find) and community ID (the ID on the right side of each observation page).

1 Like

I see. They could still change their ID, and if they don’t it eventually goes off the map.

But how do I keep myself from repeatedly revisiting recalcitrant observations in my efforts to clean up the IDs?

Mind you, I’m already frustrated with a few observations that ARE community based but no one bothers to come back to revisit for my correction. I already deal with putting effort into observations that never get fixed. I’m not keen on having any more of that.

I think iNaturalist is great for posting observations and spreading love for biodiversity, but I have a love/hate relationship with it trying to provide IDs. I go through spurts of “I’ll get these cleaned up” and “Why bother, as there’s a limit to how clean they can get.”

2 Likes

I hear where you are coming from, that has been a learning curve for me too. The best I can do in a community-driven site is to do my best at making my case on each observation where I disagree, adding a comment that is informative (without being argumentative) about why my ID differs. It’s up to me to be persuasive enough to the rest of the users involved with the observation.

If successful in enough cases, then over time it will help bend the knowledge curve of other observers and identifiers for that particular taxon.

Failing that, I can still find and export observation data of interest for my own uses. It’s just less convenient than having it already clean and tidy within iNaturalist.

4 Likes

I assume you are finding things to look at from the taxon page map right now ?

There you can’t limit sightings to things you have not reviewed, but you can do that on the explore page.

Go to the Explore page, enter the taxa you want to review, click filters button and then expand the more filters section and set reviewed to No. That way anything you have entered an ID on, or manually marked as reviewed will be excluded, then look at the results in map view.

For example this query which is buit using that shows all records of Common Loon in Canada that I have not reviewed :
https://www.inaturalist.org/observations?place_id=6712&reviewed=false&taxon_id=4626

2 Likes