Exploring New Ways to Learn from iNaturalist’s Community Expertise

No shade on the webinar folks for not answering everything, I understand that likely only a small fraction could be addressed. Some questions I had that I was hoping would get answered but didn’t (paraphrased as I didn’t copy them before closing the window, sorry):

  • has this been tested with frequently misidentified taxa that almost always can’t be identified to genus/species from photographs alone?
  • as the feature “scales”, will having comments included in summaries continue to be opt-in?
  • will taxon curators be consulted about taxa added to the feature?
  • and lastly, will this be a permanent feature, or is it possible it will be scrapped if it is not effective?

I am strongly leaning towards deleting my account, as I frankly don’t have the patience for months of asinine pro-genAI crap regardless of whether it’s here for the long term or not. I would much rather contribute to platforms that already have strongly anti-LLM policies, such as Wikimedia/Wikipedia. I don’t intend on participating in iNaturalist until they abandon the use of genAI, but if they are committed to going all in on this, I will just save time and delete my account now.

4 Likes

asinine crap is a vicious way of describing for example comments on … by …

3 Likes

I’ll repeat my comment here regarding how I approach identifying a novel taxon.

I began by identifying species with which I had familiarity using existing knowledge gained from experience. Once I identified those, there were still many unfamiliar species available to identify. If there were quite a few observations, the technique I developed was to sort research grade observations by ascending date to see the oldest ones first. I noted comments from identifiers scattered throughout those observations. Once I thought I had internalized the differences, I would then go through at least 20-30 observations just to convince myself that that tip actually worked, but without actually adding an identification. Only then would I start identifying observations that I thought were obviously that species. I would save anything that looked problematic or difficult until later. I would repeat this process for other species that I knew were lookalikes until I had a sense of how to ID each species.

However, eventually I realized there were too many tips to remember, so started writing them down (including making comments on observations). But even then, finding those again was difficult, so I developed a series of journal posts making these tips for each of the 300+ species present. The difficulty with that was that some species didn’t have representatives on iNat for comparison, so I started going back to original taxonomic descriptions and to clearinghouse databases such as the Reptile Database. I tried to make note of traits that were identifiable AND going to be visible from photos on iNat. Thus, if a trait was something not visible from a photo, I ignored it. Over time I developed a set of notes to myself that I revisit regularly. This has allowed me to identify several observations that were firsts of a species on iNat.

I will also echo the comment on the webinar that this LLM approach will almost certainly cause me to add more and better comments regarding how to recognize a particular taxon. Given that I saw my own comments repeated back to me in the demo, comments that I’ve only seen made by myself, helping build the LLM by providing additional detailed comments will likely be something identifiers start taking into account.

12 Likes

Hi everyone, I just added an update to the first post in this thread about the ID Summaries Demo. In summary, you can now see and explore the first-ever version of the ID Summaries Demo for yourself. Thanks again for all your ideas and feedback. We’ll continue exploring how to make it easier to find and access identification info here on iNat.

9 Likes

Responding to “We’d love to hear your thoughts on the iNaturalist Forum”:

I haven’t had time to watch the webinar yet so this is a rather pedantic comment on a particular id tip. I looked at Calystegia sepium and came across “And yes, C. sepium is the native species, while C. arvensis is non-native.” This may be true for the original observation but a) it isn’t an identification tip and b) it is only true where C. sepium is native and C. arvensis isn’t. They are both native in Britain.

Replying to myself after further consideration of the demo: As the part I quoted does not appear in the id summary, only in the source material, my criticism isn’t valid.

6 Likes

@DianaStuder:

asinine crap is a vicious way of describing for example comments on Euphorbia by Nathan Taylor.

To be clear, I did not have any particular person or comment in mind, and I have the highest confidence in the skills of all the staff who do IDs. I am simply fed up with LLMs being shoehorned into everything on rather flimsy grounds, with little to no discussion from people in charge about the shortcomings of the technology, how (and even whether) it will be expanded, and Google’s involvement.

LLMs are fundamentally not designed to be accurate. They never have been. They are designed to output text that merely looks like an answer or a summary, and they will never be as reliable as a knowledgeable human. They can certainly not be held accountable for errors the way a human can. And while individual queries have negligible environmental impact and the set of iNaturalist comments Gemini is using is ethically obtained, the same does not apply to Gemini and the LLM industry as a whole. It is important to acknowledge these things.

Without honest discussion of the drawbacks, what people do say can come off as unserious, insubstantial, or uninformed, although that is certainly not anyone’s intention.

4 Likes

Well, I don’t think any Chironomids will probably ever get in then. While it sounds good on paper and even in practice it is good to have multiple sources (different people commenting information). Some taxonomic groups on iNaturalist have fewer than 3 large active identifiers.

This also brings up, who wants to write something 30-50ish times? Even if you use copy and paste, how is this ID summary going to handle 30 identical comments? Write it almost word for word? I still advocate for the direct addition of identification information to taxa from knowledgeable users whether it is in the form of a wiki, guide, a new tab on the species page, or some other method.

Another thought is how does taxa not in the CV handle this? If they aren’t included due to lack of training data, that limits what knowledge users can provide. If somebody is knowledgeable, or has the sources. You could write identification information for species with no observations on iNaturalist for the day they are observed.

Absent a user wiki, guides, or any other implementation of identification information from knowledgeable identifiers. I do view this as better than nothing, as it could potentially help stem the tide of blindly picking the CVs top suggestion, and it can spread education, assuming the comments are good.

8 Likes

Fair enough. Nuclear power, and water use for AI are huge issues. So too is AI instead of (human) artists, writers and musicians. But a better way to surface useful comments, while treading lightly on the environment is worth considering, to my eyes.

As I plod on thru the Great Southern Bioblitz residue. More interested in how the ID-a-thon unfolds!

PS your quote displays my comment with fangs and claws - but I did edit them out as requested. That archived display is an interesting way to run the forum. Disconcerting. But I have stumbled over that glitch before - iNat does not forget :grin:

3 Likes

I know our team is definitely considering potential drawbacks and shortcomings. The purpose of the demo is to get a better understanding of LLMs being used in this specific context (i.e., used to organize and summarize iNaturalist comments) - both its benefits and downsides. That’s why the demo is separate from the core iNaturalist platform, and why it has feedback mechanisms for the summaries and the sources. If you’re willing to check out the demo and share your thoughts, that would be really helpful. We’re here to kick the tires of it, so to speak.

For example, here’s a screenshot of a summary which I downvoted, and which was based mostly on my comments. Because I identify Thamnophis sirtalis mostly in California, my comments about red heads are accurate for most of the populations there. But they’re not accurate for the species as a whole, it varies by region. So we can learn about how this approach works and doesn’t work for species with regional variation, and think about potential changes to address the problem. Like perhaps adding location filters, or just making sure that the sample size is broad enough.

These are early days and we’re still assessing what potential expansion might look like.

This was the method we used for this particular demo. The decisions made for this demo wouldn’t necessarily apply to future explorations. Right now I think it would be best to evaluate this first attempt for what it is (and isn’t), that would be the most helpful feedback for now.

11 Likes

Were the “Photo Tips” also generated by AI?

4 Likes

Yes, using the same source comments!

4 Likes

is there any reason the taxa are not labelled or searchable and only listed via tiny thumbnail? i already find it quite hard to browse in its current form, and i’d imagine it’d be quite a bit harder if my vision was any worse

8 Likes

Location filters would be an improvement. Working thru the summer rainfall side of South Africa - and I am floundering. Can manage very broad IDs, and use CV, but need more. More identifiers would be wonderful!

1 Like

Small bug in the ID Summaries Feedback dropdown

2 Likes

I can answer this question to some degree. Euphorbia davidii and E. dentata are two species that mostly fit your description (frequently confused and photos usually lack the information necessary to ID them). The AI gets the correct characters but comes off a bit over confident in them and tends to miss the importance of the hesitation included in the comments it uses. Some work definitely needs to be done to help communicate how difficult some of the species pairs are. That said, the fact that it picked up on the correct characters in the right order of importance was pretty exciting to me. I think I mentioned this in the webinar, but can’t remember if I gave the species.

15 Likes

What would be an appropriate feedback vote (or votes?) for an ID tip that is essentially true and helpful for IDing in a very specific circumstance, but is concerningly over-general without more context?

For example, this description of how to tell monarch butterfly eggs from beads of plant sap:

The egg of the Monarch (Danaus plexippus) is identifiable by its distinct shape and texture. It is oblong or conical, tapering to a slight point at the tip, and features fine vertical ridges, distinguishing it from rounder drops of plant sap.

This is a true statement, and it is helpful if you’re trying to figure out if something is a butterfly egg or plant sap. But it’s an equally accurate description for both the eggs of other Danaus species that might be on milkweed and the eggs of many other butterfly species on other plants. If someone took this piece of information and started identifying all the pointy finely-ridged eggs they found as monarchs, the odds would not be in their favor.

12 Likes

I want to start by saying I very much like the idea of easy summaries of key features, and love the photo tips! I do have some feedback and questions–in looking over some of the plant summaries, I’m wondering what the process will be like for determining:

  1. who’s a ‘prolific’ or experienced enough identifier to have their comments included in the AI summaries? will it be based on ‘improving’ or ‘leading’ IDs in some way? will there be a ranking process based on overall accuracy or “overturned” IDs for identifiers that might be prolific but maybe not so accurate?

  2. will comment mining be restricted to older comments, or will it have a stratified bias for newer comments? there have been improvements in the quality and depth of comments, as well as new findings for various taxa that just don’t exist in early observations

I ask because I know this is a demo that’s limited by a small number of identifiers, but what will the model do when 2 or more identifiers fundamentally disagree on a characteristic? Does it ignore that characteristic or provide context that the characteristic isn’t generally agreed-upon?


Looking at Quercus agrifolia, I see that the comments aren’t necessarily backed by references, but this might be an artifact of the comments that were mined being older identifications, and a limited set of identifier comments being used.

It’s also factually incorrect in that these leaves may be flat, though they do tend to be most commonly convex. I’m also wondering in terms of ID summaries feedback at right, how we are to evaluate the statement “Distinct from other summaries”? Is this mainly meant to be a place to call out the summary as being overly general or vague compared to related species in the same genera, or something else?

Looking forward to seeing more here, I think the summaries are very promising overall, but I’m hopeful that there will be some quality-checking process embedded so that an ID summary like the example below doesn’t end up misleading more inexperienced identifiers/observers than it helps.

3 Likes

Exploring the new Demo, I find the concatenation of IDer comments truly useful! The AI summary of those comments adds nothing but length and uncertainty. Please do roll out a tool that lets us easily find IDing advice from top IDers. The AI summary is never going to be as informative, trustworthy, or well written as the comments it draws on.

12 Likes

This was my concern when I went through the demo. For the rattlesnake example it states that the rattle can be used to identify the species, but really all the rattle does is get you to one of two genera of rattlesnakes. So clearly there is some information that is useful for higher taxonomic ranks, but not really useful for identifying to species in all cases. In the webinar they made some comment about being aware that this is an issue, but not yet providing rank-specific feedback. It’s also the case that in situations where that higher rank trait can identify to, say, genus, but there’s only one representative locally the genus-level trait is sufficient to ID to species. So, I think the answer is this process will require identifiers to start noting more specifically when something is useful for recognizing a higher rank, but not for identifying species.

4 Likes

Just an FYI @tiwane that the demo Ilex verticillata photo Ilex verticillata (winterberry holly) in November 2018 by Susan J. Hewitt · iNaturalist is likely not I. verticillata. I. verticillata leaves are rugose-veiny, and this species is not. Itea virginica is my best guess, but without a broader photo I cannot be confident.

4 Likes