Artificial Intelligence-powered search tool for iNaturalist / GBIF

Hey fellow naturalists, I feel like there is a lot of room for AI-based search tools to go through iNaturalist and GBIF data, for efficient data analysis. For example, asking a question like “what’s the most common fungi species in each US state?” and the search assistant would be able to find that through a simple search of the data filter:fungi, location:state, species_observation_count:index[0]. Then output a table.

Using the same prompt as above, I manually made this map below a couple weeks ago, but I feel like it could have been automated if there was a better interface:

I know that this is possible given existing AI tools out there, and for certain queries, it would be so efficient so that people can work on the more brain-required elements of a project. Other questions that a properly set-up AI would have no problem retrieving would be like:
What are the common bird species between Brazil and Canada?
(Already using ChatGPT and copy pasting the observastion page from iNaturlist, the answer is Common Loon (Gavia immer) and House Sparrow (Passer domesticus) - no reason that the AI can’t do the full workflow)
What country has the most observations of Oak trees in the world?

All the data is out there anyway, this tool would just help navigate it all and do simple data analysis for those who may not be power-users of iNaturalist or data people. Would this be of interest to any one else or am I just spouting something that isn’t needed? I didn’t add it as a feature request as I am not sure how exactly it would like (stand-alone program, or part of iNaturalist). Maybe something as simple as this:

10 Likes

So, a sort of “Plain Language to Search String” translation tool? Sounds great.
Extra points if it properly displays the search string, instead of just returning a list of obs. (I hate the ‘dumbing down’ that AI allows…)

4 Likes

It looks like very demanding queries

1 Like

Just a technical note, that would not return a correct answer for, “the most common X in X region,” it would return a result of, “most commonly observed X in X region.”

That distinction is important and not making it results in extremely bad research.

With things like fungi in particular its extremely unlikely that what’s observed is actually the most common. Even with things like birds and plants it’s often the ones that catch the attention or are easy to ID or see that are observed at higher rates than what’s actually the most common.

It’s important to be clear on what the question asked is actually asking, as well as the limitations and biases in the dataset being searched.

18 Likes

Oh, I tend to be clearer on that than my browser’s search engine is. Unless I’m searching for something really popular, It’s unlikely I’ll get any results that are germane to the whole search string.

1 Like

https://eefalsebay.blogspot.com/2021/11/great-southern-bioblitz-october-2021-cape-town.html

Most observed yes. Tall proud and blazing colour.
But actually most common, by biomass? That is a challenging question.
Perhaps the invasive Australian wattles?

1 Like

Its not even necessarily the most commonly observed fungi, just the most commonly identified

For one, I guarantee Amanita Muscaria aren’t the most common fungus in ANY of those states, they’re just big and flashy and iconic so everyone, even people who aren’t into fungi, notices them.

…Podaxus pistilaris might actually be close to accurate for Arizona but that’s mostly because not much else fungi wise grows in the desert - though I’d be willing to bet the actual most common fungi there is probably some sort of unobtrusive, hard-to-ID lichen.

(This has been mushroom tangents with Lothlin, tune in next time)

10 Likes

It’s not even necessarily the most commonly observed fungi, just the most commonly identified

Yep.

I’d be willing to bet the actual most common fungi there is probably some sort of unobtrusive, hard-to-ID lichen

It’s probably some microscopic soil fungus, perhaps one that’s not even identified yet. That’s likely to be the case in most areas.

6 Likes

Yes, exactly that, a plain language search tool.

As a reply to other users as well, yep, it is just a wording change “most observed” rather than most common

1 Like

Very true, but then is there any benefit to knowing the most observed/identified? Surely it is an indicator

1 Like

people who are not data people who want to do data analysis. i wonder what the size of this set of people is?

the set of things that can be done within the context of the Explore page search is limited compared to all the things that could be discovered from iNat data. just for example, you could ask which country most observations of Oak trees in the world, but a single view of the Explore page is not going to tell you this because it doesn’t aggregate observations by place, let alone countries.

to answer the question definitively (using currently available tools to get current data), you need a completely different interface and approach to get the data. effectively, you would need a list a countries, and then you would need to query the Explore page 200+ times (once for each country) to answer this question: https://jumear.github.io/stirfry/iNat_countries?taxon_id=47851&stats=observations. then you would need to get the max count from this result set to find the winning country.

so my point is that if you want to be able to ask any question to some sort of chat bot, you need to implement it in an interface that is not limited to one context (like the Explore page is). if i were to try to do this, the path i would head down first is to see if there’s a way to extend the Forum to allow a space to – instead of just searching for previous posts – ask a question to a chatbot that will not only find related discussions but also try to answer your question directly. and then if it didn’t answer your question satisfactorily or find previous discussions on the subject, then folks in the forum could try to answer the question.

3 Likes

Plant iconS

Somewhere - WAY back in the forum - talking about the lonely only ‘iconic’ taxon for all that green stuff.
Someone meticulously did artwork, creating icons for THE Iconic Plant Groups - moss for example. There were 8 or 10 icons. They were both beautiful and effective!
Damned if I have ever been able to find that again. We are still stuck with a token leaf for all Plantae. And that icon is disconcertingly similar to the icon for Unknown. I guess the leafy question is how iNat sees planty stuff.

3 Likes

https://forum.inaturalist.org/t/split-plantae-into-several-iconic-taxa/24698

1 Like

Close.

But the one I remember was earlier / older with more icons.
It was before AKR

1 Like

This is a great idea, but it may be quite a challenge. That does not mean it shouldn’t be attempted. In fact, it would be a great idea to try it.

That is an important point. Natural language, human nature, and the natural and cultural world in general all have their many nuances. A consequence of this is that many questions and the methodology behind how they should best be interpreted and answered are often much more interesting than the one who asked the question may have intended.

There’s one of those interesting questions, beginning with the issue of how to interpret it. In addition to determining a literal meaning of the question, an AI perhaps should also attempt to determine what the one who asked it may have intended by it, so that the answer might satisfy the intent of the request. After the question is posted, a clarifying dialog would be necessary.

Does between include Brazil and Canada, or was the exclusive meaning of that word intended? Considering the complexities of geography, does the range include Cuba, or how about the Galapagos Islands? Answering the question, once a meaning is settled upon, also might entail the counting of individual birds. Should only official counts of observations from some organized databases be used, or should the AI also comb through discussions, looking for accounts of observations, such as “I saw thousands of snow geese on the shore of Lake Champlain today at Point au Roche State Park”?

In my opinion, this project is worth trying, and in fact may ultimately become a necessity, since a machine can perform a well-defined methodology much more quickly than a human.

The intention of the question - could be - common to both countries?

1 Like

That is a possible and even likely intent, among others. Since between has multiple meanings, an AI would hopefully pick up on the ambiguity of the question, and therefore ask for clarification before issuing results.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.