Exploring New Ways to Learn from iNaturalist’s Community Expertise

Hi everyone,

We just shared an update on our blog about our ongoing work to make it easier for people to learn identification skills.

Please find the full blog post with more information linked here.

A very brief summary of the update:

  • We are exploring how to summarize existing expert knowledge to make it easier for people to learn identification skills without adding more burden for identifiers.
    • We’re developing a standalone demo that uses an existing LLM (not training a new one) to turn a volunteered set of identification remarks and comments into brief summaries.
    • This experiment began with only our staff’s identification remarks and comments. This yielded promising results, so we reached out to a small set of experienced identifiers to ask if we could conduct the same process with their content, as well. We’ll be working with them throughout the process to assess the results.
  • All of this is exploratory work and will continue to evolve with community feedback.
  • Once the demo is closer to launching, we’ll host a live preview and Q&A. You can indicate your interest in that live session here if you haven’t already.
  • In case there is any confusion, AI-generated images are not acceptable on iNaturalist as an evidence-based platform. We recently implemented more robust guidelines and tools to make it easier to flag and remove AI-generated images from the platform.

We welcome thoughtful feedback from the community, and please abide by the Forum Guidelines.

37 Likes

I put this thread on slow mode to encourage a broad base of discussion.

9 Likes

I think this is a fantastic application of AI and will add value to all of the comments that we make relevant to IDs because those comments can actually be found and put into this broader context. I feel that this will make it even more likely that I will add future insightful comments on observations. Given that the amount of information that the AI model is based on, I can’t imagine that it will require an unacceptable amount of energy usage. And it links directly to the comments upon which the summary was based.

[Note added later: Even if the AI summaries prove to be of no value, having AI identify comments that appear most useful in providing diagnostic criteria seems of great potential value.]

23 Likes

Thank you for closing comments on the blog post, and keeping the conversation here in one place. (Better than the ongoing grumbles about closing comments halfway last time)

https://www.inaturalist.org/projects/observations-with-id-tips You may be able to incorporate some of this at a later stage.

10 Likes

It shows promise, but hard to say much beyond that without trying it out. Somewhat like trying to guess how nice a car will drive from just looking at it.

I will say it very much could turn into a disaster without proper illustrations or definitions of anatomy. If the CV starts talking about, RM, R2+3 wing veins. That information will mean nothing to people if they do not know the names of the structures for that organism. If I am using words like Gonostylus, superior volsella, most people are going to say “Gono what? I have no idea what that is”
Also what is the alternative? Some things yes you can word better, but how can I explain IDs without using the proper names for anatomy when there are no name alternatives?

Aesthetically, the UI looks nice and I do look forward to the development of this. I think it has the potential to work and be a good help.

27 Likes

We tried to manually collate this using a project on iNat: https://www.inaturalist.org/projects/keys-s-afr - basically trying to accelerate access to diagnostics, notes, distributions and other information posted on observations. In theory one filters by a taxon, and the project should turn up relevant information.
[e.g. Velvetworms in the Cape : https://www.inaturalist.org/observations?project_id=18643&subview=table&taxon_id=90935&verifiable=any - which interestingly illustrates how information from a decade ago is inaccurate today - the number of new species has changed the distributions)
In practice, it is impossible to curate. The beauty of AI is that it should be able to do this rapidly, comprehensively and extensively, as well as summarize and integrate.

Look forward to seeing how this evolves.

9 Likes

Can I add my own expertise for some species(and cite dichotomous keys as evidence)?

7 Likes

I think the example Monarch ID summary is kind of weak? The summary repeats the same thing in two different ways, probably not recognizing that they were saying the same thing.

A human-made summary or guide would probably show a Monarch and a Viceroy side-by-side with an arrow to the band on the hind-wing of the Viceroy, instead of relying on jargon that most beginners don’t know (e.g. “subterminal” and “hindwing” - even if the latter seems obvious, it isn’t to someone who has never paid attention to butterflies before: many haven’t yet noticed that butterflies have more than two wings).

A good summary would also note the locations where Monarchs can be found, where both Monarchs and Viceroys can be found, and compare and contrast with other species which are easily mixed up with Monarch (e.g. Queen, Danaus gilippus).

About the only good thing I can say is that the summary didn’t distort the comments overmuch - only inserting “adult” and “notably”.

41 Likes

Yeah, given things like what you pointed out, I’d definitely say any computer-aided summary should be a starting point, and may need to be heavily revised by a person who actually knows how to ID that taxon for the guidance to be succinct and accurate enough to be of use as a casual reference. I’d say a human review is straight-up necessary before releasing any computer-aided summary for public consumption.

A test case like a monarch also shows the need for life-stage-specific categorization of ID guidance. Location-specific delineations will often be useful, as well.

Just putting AI-aided summaries on pages with the inevitably ignored “oh btw this is AI so it might be wrong“ disclaimer is not responsible.

25 Likes

One of the many challenges with making this honestly useful is going to be the fact that the best identification tips for the same species often vary geographically. Tips from one county are often misleading the next county over. The same can apply across season, life stage, etc. AIs are still quite bad at circumstances like this where information is only true in an unstated piece of variable space.

I strongly agree with:

And given the fact that it needs human revision anyway, I don’t see any advantage of the AI summary over directly giving the human reviser the comments the AI is summarizing.

37 Likes

It’s definitely something we found when using my comments for Thamnophis sirtalis, which occurs across much of North America and varies in coloration wildly from place to place. I pretty mostly ID it in California along the coast, where it pretty much always has a red head. Ones on the east coast of North America have quite different coloration. So some of my comments were not pertinent to those populations, but others (such as scalation) were. Definitely something to consider.

But it did a good job summarizing ID tips for distinguishing Taricha torosa and Taricha granulosa, which are two often-confused species in California.

It could also summarize comments about what’s needed for IDs, like “Photos of the genitalia are necessary for identification” or “Photos of the mushroom gills are needed to distinguish this species,” which I think would be a useful.

As the blog post says, this is kind of kicking the tires on this approach and learning from it.

It’s certainly made me think of how I write my comments and how I could improve them.

20 Likes

I really appreciate that the model directly includes the comments that it is referencing to make it easier to look into the context and discussion that the information came from. Its usefulness and so on are hard to gauge without getting my hands on the model directly and seeing how it talks about similar species (what would the corresponding viceroy page look like? how useful would it be for comparing those two species?)

I understand that the point of the grant is to include the LLM and so it was necessary to include as part of the demo. But I also wonder if this would be better (and cheaper, and more slim-downed) as a model that helps look for ID information and discussion on iNat and collates it together like on this page, without the need for an LLM to write a summary at all. Surfacing this kind of info itself seems the most useful thing it’s doing to me, not writing the summaries.

35 Likes

I was thinking that a “limitations to ID” section at the very top might be a good use. “Only identifiable when blooming”, “adults are indistinguishable unless X is visible”, etc.

It also seems like being able to link to not just the comment but the associated observation might be necessary to make sense of comments that include context like “in your area” or “in our region”.

16 Likes

Even this varies geographically. Here in Sonoma County the caddisfly Nerophilus californicus is clearly identifiable by coloration. In other parts of its range, one needs to look at the finer anatomy.

10 Likes

I don’t understand how AI works so I can only tell you how this makes me feel. Every time I read that AI is better, faster etc. it lowers my motivation to identify. It makes me feel that what I do when identifying (mostly unknowns) is not going to be useful any longer. I don’t know how AI works, but I have read a great deal about how it doesn’t work so I am wondering if I will be endlessly correcting a program, instead of interacting with the people on iNat. I will say again, this is how I feel not what I know. I have learned so much on iNat from so many people. I hope I have passed on that learning to others. How I feel is that AI will take that away.

30 Likes

That’s especially true of the Computer Vision, correct? Perhaps some day you won’t have to mindlessly pour over millions of unknowns and you can spend your time doing more valuable activities. And that would have been made possible by your identifications.

6 Likes

I think there’s a lot of potential here, especially if allowances can be made for geography, life stage, stress, etc. I really look forward to trying it out. I’m actually kind of curious if it works better for species where I copy and paste some standard text across a lot of observations or those where I offer information pertaining to a specific observations. If the AI proves to be responsive to new comments, I think it will motivate me to add more contextual information to my comments.

14 Likes

Humans identifying observations (including unknowns, I do a lot of the same!) will always be useful and important! :slight_smile: The comments and IDs from the community are essential to iNaturalist, and it’ll stay that way.

With this demo, we’re also hopeful/curious to see if displaying people’s comments and linking directly to them will help encourage more direct discussions and interactions.

11 Likes

I’m not sure if a large language model is the right way to do this as opposed to a different language model? If we’re starting with a corpus of iNat staff ID comments, then wouldn’t it be easier to tag those with the relevant features and just search those, instead of doing the LLM-style “generate the most likely next word”?

A rudimentary example:

[category: taxon] Bees can be differentiated from [category: taxon] hover flies by the number of [category: body-part] wings.

And then if the CV model suggest bees, it can be accompanied with volunteer-written text tagged with [category: taxon: bees]. Not sure if the ID summary demo used any tagging? If it does I’ll remove the over-explaining lol

In my opinion if the computer generated IDs behaved more like a search engine (of pre-written ID tips written by volunteers) it would be more helpful than a large language model. Both complicate iNat’s user interface though.

Unfortunately, LLMs are all the rage and iNat probably doesn’t have the manpower to do its language processing in-house. Oh well.

14 Likes

just give us a community wiki at that point, it will be easier to implement and edit. please just stop with the ai… most of us are tired of having ai and llms shoved down our throats at every opportunity. please listen.

43 Likes