Species Suggestions for the Wrong Continent

Would atlases help with any of these? The main atlas page obviously isn’t really engineered to cope with 129 pages of atlases with out-of-range observations, but it is nice to be able to pull up a list of Probably Wrong observations at a click.

3 Likes

I am also a relatively new user, though I have been using it a great deal since I started. I have also run into this problem, and have committed this error of being far out of range based on suggestion I will admit. I can only imagine how frustrating it must be for experienced users/experts.

I think the idea of a warning if nothing else is a very good idea. To be honest, from the standpoint of user experience design, I would really expect that suggestions would be controlled by default for this factor. Indeed it was a few months in before I realized fully that the suggestions were NOT. Also from the standpoint of wanting to encourage citizen scientists and support an enthusiastic user base - it doesn’t feel great when you realize you were so off base in something you suggested. Certainly educating the user base is also one of the goals if such an endevour - and hopefully this can be done in a way that helps educate ahead of time rather than embarrass after the fact.

iNat has been an amazing find for me, and I believe it has the power to do much good and contribute, but especially given that many users are well-meaning but not experts, the AI system itself needs to be modulated in a way that helps prevent mis IDs.

12 Likes

Not only the wrong continents, but I have been bemused always seeing taxa belonging to four or five or even more families of plants. The suggestion must be very clear that this is a suggestion only and to be critically checked against nearby observations. Possibly add the “compare button” here?

3 Likes

I still wonder if the algorithm is intended to take geography into account—perhaps over time as we continue to make corrections, or is that component just not there? Maybe one of the staff or someone else who knows can chime in on that question.

From @tiwane in 2018 on the Google Group:

When you submit an observation to computer vision, it takes the first photo and runs it through the model and spits out visually similar results. If something is “Seen nearby” that will affect the ranking of the results. Improvements to ranking and displaying results is definitely something we are looking into. That, along with a way to train the model on not just species but higher-level taxa will be really helpful, I think. But that will take time, it’s not easy.

and

“Seen Nearby” searches for Research Grade observations - excepting the observation in question - within 100km of the requested coordinates and 45 days before and after the date specified. [not just that year, but any year]

2 Likes

In Central Texas, and probably across the South, we see this consistently with the live oaks, which are probably our most common native oaks. Invariably the first suggestion is Quercus agrifola, which occurs only west of the Sierra Nevada. The common name given, coastal live oak, is no clue to newbies as our live oaks are found all along the Gulf to mid-Atlantic coasts. I would like to see these features added:

  • Build a geographic factor into the suggestion algorithm so species outside their known distribution are excluded from the initial list.

  • Also check against regional alert lists. For example, in much of the U.S., kudzu, zebra mussels, or emerald ash borer would be on such a list. If there is a close match on that list, add it to the initial list and mark it prominently as an invasive species of concern.

  • At the bottom of the initial list, add a link that reads something like “Other possibilities.” When the user clicks that, widen the search to include other species that would be outside their known range.

  • Add the ability to filter for species that have been identified where they are on an alert list and, separately, other instances of species having been identified outside their known range.

Perhaps there are other ways to accomplish the same goals. These are the key features in any solution:

  • First, present a short list of the most likely possibilities in that locale plus “alert list” lookalikes, if any.

  • Distinguish clearly between natives and introduced species.

  • Let the user choose whether to widen the search.

  • Make it easy to find recent IDs of species on alert lists.

  • Make it easy to find IDs in a locale of interest to me that are worth reviewing because you wouldn’t expect to find them there.

7 Likes

Hi @baldeagle! I hope you don’t mind, i moved this post into this other thread since there’s been lots more discussion about the very same topics here. Though if you decide to branch off some of your other points as seperate threads or feature requests that is fine too!

1 Like

@treegrow and @bouteloua have caught a bunch of my errors with this…nice people helping me learn are the main reason I started to understand the function and limitations of AI/ CV (never sure which is the appropriate term). So, thank you. There is a wealth of dialogue on this and similar topic(s) that occurred on the google group but I’m hesitant to pull in my own or others’ relevant comments and make things too wordy here.

3 Likes

I agree with the concerns raised here, and have submitted a feature request for this issue. I linked to some of the relevant discussions on the google group. Please do vote for it as a feature request and move it up the list of priorities.

A good approach could be to brainstorm ideas for how to fix it here on this thread, and then people can add those to the feature request thread when there seems to be some agreement on a way forward?

5 Likes

For me this is key thing that’s needed. A wider net than 100 km would be appropriate in many parts of the world. A radius of 500 or 1000 km to filter suggestions should mostly fix the problem of Californian plants being suggested in South Africa, or Pacific molluscs suggested in the Atlantic. The “Seen Nearby” text could still be included for taxa seen within 100 km (I think within 45 days is too restrictive in many places - many taxa can be found year-round in the tropics but have few records).

I also like this suggestion. At least for taxa with complete taxonomies, the AI should “know” what it doesn’t know, i.e. it might be aware that it has a sufficient training set of photos for only 5 out of 12 members of a genus, and downgrade its certainty accordingly if it matches a photo to a member of that genus. Exactly how to program this seems like it could be challenging though…

5 Likes

Speaking from the perspective of the Hong Kong uploads, I have noticed a good portion of the Lepidoptera uploads (mostly excluding Papilionoidea) contain ID’s for species not found in Hong Kong. Usually they would be some moth found in the USA. This issue typically seems to arise from the iNat AI in the visually similar part in “suggest an identification”. Typical red flags for me would be just one location pin in Hong Kong and a bunch of other pins elsewhere (but then again one would have to take into account how cosmopolitan the species is). Usually I would send a comment in the lines of “when making an ID please consider the distribution of each species”, but likely to no avail.

For something like moths, many species look so similar in terms of their morphology, I wonder how useful species suggestions are if the underlying algorithm is not optimised enough. In that case this issue may stem from the (because I can’t think of the right word) coding side, in that the code have not factored in the species distribution to make a more informed ID suggestion. Then again, coding this will be extremely challenging.

On the other hand, this may be an issue of how diligent users are in terms of ID’ing their species. Sure, if we want to prevent this from happening we could make the user experience more restrictive (eg. make location mandatory so that the algorithm can then restrict itself into providing ID suggestion for that region), but I wonder how effective that may be in the long run.

Edit: just found an example of what I meant.


3 Likes

Uploading files from a developing country, where internet is at times slow and patchy is sometimes a really slow process … I think this might be largely due to functions like the picture based species suggestion process. Example: Tried to upload pictures of Consolea (Cactaceae) lately … typing Consolea in the name field … didnt stick … again … again … after 15 minutes i get the message from the website … we are sure this is genus Opuntia … and yes they look very similar, but a Consolea is (currently) not a Opuntia. So the computer based guess did cost me 15 minutes for a single observation upload and it was worse than my own guess. I have more examples like this … but its maybe boring to read. I would love to have a upload site, where this data eating nonsense is minimized and where i just can upload stuff!!

4 Likes

turn off auto sync. the auto sync feature of the app breaks the app for anything other than the most casual use. If you go to settings and turn it off you can then just upload when you have a wifi connection, tell the app to upload all and then go do something else.

1 Like

I think @mreith is referring to the website. I started a feature request here:
https://forum.inaturalist.org/t/toggle-computer-vision-suggestions-on-off-on-website/1117

3 Likes

ah, i never really noticed it being slow on the website, i think because i usually add things via the app. Things have been slow lately though… that makes sense.

thanks a lot! will try this!

OK to add on to my previous post here: I think it may be beneficial to at this moment suppress the “visually similar” feature in the “Suggest an Identification” section. I feel it much better (please voice your opinions) if one keeps the ID simply to family level, or to class etc rather than to take advantage of that feature and key in a wrong species ID. And especially with the CNC coming up in a month’s time I feel it really hinders the IDing workflow and one might end up with fallacious results etc.

EDIT: Well, wrong ID is still a wrong ID…

1 Like

I don’t think taking away the feature is a good idea as many are attracted to this feature, though I don’t have any reasonable alternative to fixing this problem. However, I agree that this can become quite a big problem during events such as CNC (both the international one and a local one we just had in HK). There are so many mindless IDs such as reptiles being IDed as mammals, moths as beetles etc. Many of these observations with bad IDs are lost in the enormous surge of observations, where they will never be corrected and many 'unknown’s are also lost. I don’t know if this problem will ever be solved but I would also like to raise awareness to this issue, as we all know iNat can be an invaluable tool in providing scientific data.

1 Like

at this point i don’t think we are going to get rid of the algorithm. Nor should we. But it’s fully integrated in the site and it’s getting better over time. Deleting it will just piss people off. Before the algorithm people would just add random nonsense IDs on their own anyway.

2 Likes

I have lost track of whether this was already brought up, but one possibility is to tweak the algorithm/scoring system such that IDs chosen from the computer vision “visually similar” suggestions require a higher standard of confirmation.

2 Likes