I don’t believe that peer review by AI is something to be taken as a serious scientific option.
If it shall be in future a “Facebook of living things”, than perhaps OK. For some applications (…may be most?), this makes a real difference. That’s what all the buzz is about.
One of my more constructive thoughts so far:
For me, currently the process to initially assign an ID is pretty much OK.
The trouble starts with the review and the validation of what was found which relates to the identification content integrity.
Different people have much different habits (i stay positive
) and there are a few enthusiasts who try to maintain data integrety by literally shovel tons of … every day.
Why not help them and provide better tools to data quality administration as a community task ?
#Idea:
-We have the reporters and co-reviewers proposed average taxon.
-We calculate from the already existing CV an average CV proposed taxon.
-We calculate the geometric distance between CV proposed and community proposed as a vector of divergence.
-We use gen AI to review the higest rank (most specific) taxon from either reporter or CV for criticality (#of reviewers, # of highest rank taxa considered, % disagreements, comment content, occurence of the taxon, direct or indirect observation evidence marked in comments) to assign it a commented criticality number.
-We scale the divergence vector (basically by some sort of multiplication) with the cricality number to generate a dynamic observation based criticality indicator
-We normalize that indicator to +10 (likely much overdetermined) to -10 (likely much underdetermined)
-We add this indicator to the observation and to the filter function as a dynamic number (dynamic because the average community rating will change over time)
-We keep the entire gen AI comments on criticality per default hidden to not pollute the human process part and thinking
-We make it visible only case by case on demand if somebody wants to understand why dog poo is considered a critical evidence to identify a dog.
Expected result:
PRO arguments:
-The non-expert will be able to understand his impact and can stay away from too difficult taxons but care for the data quality in most abundant and easy taxons
-The expert can focus onto cases where expertise is required.
-Both together can take their share to improve data integrity
-The criticality label of a find will dynamically reduce over time with the community adding more common sense to the original proposal
-Those seeking for details to start moer in-depth research may get it from the AI generated on-demand comments, based on that may contact other users or look into referenced material.
-People will engage more to use the annotations for to distinguish direct from indirect evidence to down-rank criticality, which makes a real difference (dont know how many hundred beavers are reported in Luxembourg just because somebody found a stick a beaver gnawed on
.)
-If many people agree to abuse the system and call potential dog poo a scientific dog observation, it is still possible. But it will also be visible and they will discredit themselves visibly and automatically
CON arguments: ?
Maybe i was too technical and for sure, it is not thought through in all detail.
The rough idea from a community standpoint:
We have a very large and diverse network of HI (human intelligence) in the review process with all its capabilities and challenges.
I do not want an automated super-influencer of uncertain skill level (gen AI) who sets coordinated bias and interference to this HI network.
I want to see resouces (manpower, energy) spend the right way.
Proposal:
-Create a new gen AI supported metric to measure the quality of our activities in the process of identification based on the goals of this platform.
-This with no direct influence to the human thinking, only on explicit demand.
-Bring this quality metric to the level of “reviews accomplished” or “finds uploaded” so that people start competing for data and review quality rather than for mass of entries.
Expected result:
-Less reward for uploading minced meat finds and reviews
-Thereby reduce such entries in number, free-up workforce, memory storage, money and reduce energy footprint.
-By visualizing the quality metric in an intelligent way (above mentioned idea) as a criticality index give users a better chance to focus their review efforts according to their skills and knowledge level.
A friend of mine who is much more in this topic than i am said that this may not reflect the strengths of gen AI nor may follow the interest or intent of Google.
But i believe this is what we may need and want because it adresses many of the earlier mentioned concerns in this forum.
Just came about forum
https://forum.inaturalist.org/t/observations-by-suspended-users-should-indicate-the-user-is-suspended/67360/8
In amendment to above, the gen AI output related to a specific observation could also provide in addition a red-flag notice if a user was suspended. If it can digest user comments, that should be easy.
I haven’t read the whole thread, so maybe this has already been said. For me the main thing would be that it stays within iNat.
- The AI only draws from sources in iNat (that is comments, journal posts and also linked articles) and not random junk of the internet.
- also doesn’t give it away. I suppose if google steals text, there is nothing we could do.
- give the sources, so one can check for themselves
- write the text in the language the page is set - comments are not necessary always in English
Fair enough. There are AI applications which do this – for instance, in medical diagnostics, when an AI is trained to evaluate diagnostic imaging. This is, of course, predictive AI, more like the CV, rather than generative AI. Are there examples of generative AIs that remain within their specialized use-case platforms?
In some ways, one could question the relevance of even asking the question. AI ain’t going away, whether we find it “tolerable” or not. What technology that has found uses in the world has ever gone away because of people’s objetions? When robotics first entered the workforce, a lot of jobs were displaced; a lot of people understandably didn’t like that. But robotics are still here. A more extreme case: nuclear weapons. Yes, there are nonproliferation treaties, but nuclear weapons are still around. I don’t see any likelihood of AI going quietly into the night; especially as its energy efficiency improves and the energy tranistion proceeds, one of the main contraints on its application will abate.
Questions like this can be useful for ensuring that AI is deployed in the most appropriate way, minimizing destruction and maximizing benefit. But let’s not kid ourselves that a technology-based entity like iNaturalist will remain forever free of it.
If Gen AI is influencing the decision making process of X human users, it must be X times more reliable and responsible than the average human user (however the community or Google will rate that
).
Nobody wants to teach (not educate, because thats impossible) a robotic idiot.
With several thousand users from all continents, very broad cultural and educational background and skill level, I think Gen AI must be VERY good right away from start.
Otherwise:
Hope dies last, isn’t it?
Here are my thoughts.
- If it makes the existing CV more accurate by using heuristics to guide CV identification and performing a “sanity check” for diagnostic features, I am strongly in support of that.
- AI should provide links to good sources on identification/taxonomy, but it shouldn’t paraphrase and regurgitate material from those sources. The first option helps people engage with primary sources, the second option teaches bad habits and risks producing accidental misinformation.
- AI should not write an explanation to go along with an ID suggestion. It will make it harder to work through bad IDs if most of them are accompanied by AI-generated explanations that appear to be written by humans. Those of us who actually take time to organize and share our thoughts, knowledge, and questions will then be drowned out by the sea of AI-text. It would also make it harder to quickly recognize who actually has knowledge and experience in identification. The better the AI is, the worse this problem would be (because good AI is less obvious).
- AI shouldn’t be used to write ID guides. If/when it makes mistakes, humans will then propagate those mistakes, causing bad ID information to become widespread. I see no advantage for AI-written guides. Out of any activity one could do on iNat, writing an ID guide requires the most knowledge and expertise, and that of all things is not something that should be automated. Humans should be teaching AI, not the other way around.
- (11) If the site needs more information on how to ID (it does), make it easier for the many experienced IDers who use this site to contribute their guides to a centralized location. This will do FAR more to improve the iNat experience than anything AI can do. It will make IDs more accurate, will make it easier for observers to transition to identifying, and will engage people in the process of identification rather than increasing their reliance on AI tools that automate the process.
- Make it so AI can only be used for one’s own observations, not for IDs of others. Observations should not reach research grade simply because two different people accepted the AI identification (this is already a problem). Alternatively, restrict its use to experienced IDers who use it for efficiency.
- Give prominent warnings/disclaimers: AI should never be used as the sole basis for an ID. Always check other sources.
- It should be against the rules to blindly add the AI suggestions to other people’s observations. I’m not sure exactly how it would be enforced, but it needs to be made clear that this is at the very least against the “spirit of the law” on the site. Show an example of what happens when users blindly accept AI suggestions (feedback loops that amplify errors).
- If it does give a written explanation for how/why the CV reached an ID suggestion, this explanation should reflect the actual reasoning or heuristics used in the AI ID process, not a plausible explanation for how it might have arrived at the answer generated from human-written comments under IDs for the same taxon.
I like it! You try to distinguish between AI providing support and AI providing newly composed information!
I always hear that this is not how it works and that the way Gen AI comes to its “conclusion” is intransparent by default.
Gen AI has no self-awareness, even if it verbally may mimic it.
That’s why I doubt it can more than provide
I also consider it problematic to just re-phase other human users input in a linguistically convincing way. That’s “artificial information” without control, and all its interaction with human users is to be seen therefore as critical.
As long as i do not see how AI generated content is qualified, i must question/qualify it as potential fake information. If there is neither program code nor algorythm to verify nor self-conscious AI being that can explain itself, AI in the sense of “Artificial Information” is not only nuts.
It is in all sense dangerous.
Okay. So one of the organisms I encountered today was a type of fly that waves its wings around while walking. I started with a simple web search, “What kind of fly rows its wings.” Gemini produced a summary of the normal rowing motion that many insects use while flying. But as I went a few more rounds of conversation with it, we got down the the suggestion that I likely saw a kind of picture-winged fly. From there, I went back to web search mode and looked for Ulidiidae. That led me to images, which in turn led me to a tentative, genus-level ID for the flies I saw.
This is a use of AI that I’m okay with: brainstroming which leads to suggestions that I can follow up on.
The issue i see is that you havn’t insight from where genAI (origin and quality of source) got its wisdom nor how it translated your question to constraints for its search to exclude possible alternatives. Could well be the source is from a comic or a fairy tale and the true species was excluded because it so far was seen only in April and not in March in your area. AI can’t extrapolate.
If you mark such identifikation as AI assited preliminry 1st ID and not to the same quality level as human ID assignment, i’d consent.
But today, there is no provision taken in INat to qualify identifications to different levels and no differentiation between 1st nd 2nd ID assignent - which as i read, i’m not the only one considering as problematic. I clearly see the benefit and bauty in today’s workflow design, which tries to keep it simple. I see its challenges, mostly originating from human nature.
What i am missing and probably is now to be investigated is to at least let the workflow deliver a comprable level of data quality while -somehow- integrating gen AI.
My idea to make the buck the gardener and let gen AI work on data Quality review - which today is not systematically included at all in the workflow - was so far not commented.
Which is no different from initial IDs made without any comments. Which is most of them. All we know about first IDs now is whether or not they were CV suggestions - and even then, someone can easily bypass that by looking at the CV suggestions and then typing in the same name.
If you believe that my ID is wrong, you have the same opportunity to disagree whether or not I brainstormed ideas with an AI. My worry is that the anti-AI sentiment here might motivate some people to disagree with correct IDs, even if they know them to be correct, because an AI was involved. Disagreeing IDs are not much more likely to have explanatory comments than initial IDs, so there’s no way to know.
To put it another way: all this talk about transparency, but human IDs aren’t necessarily any more transparent.
Nothing to disagree in what you write.
But HOW could a centralized AI service make any better what you just describe? That is what my earlier question and idea was about.
If genAI would influence users with unreviewed Artificial Information and content, nothing would be better, only worse. That’s why i say: segregate genAI from the actual content sharing and ID process, use it for content flagging and prescreening. Like: “AI sees that species XY can never be separated from species AB just from picture without dissection. Therefore if ID set to be XY => red flag. Strong recommendation for further review.”
If genAI would interfere directly with the ID process data sharing, I still see no argument that can invalidate what you tell is sentiment and i tell is a problem by mere logic.
Hurray, another consequence of direct AI involvment! The evergreen of clash of cultures! Need for a user-group of “Anonymous AI users”? Or shall we keep AI use anonymous by default? Shall we ban AI completely?
I can only say that looking away if direct involvment may come will not be satisfying.
The only apparent resolution seems then pretty obvious: People who won’t spend their time in artificial entertainment will simply quit. And the others will follow their individual believes, entertainments and hopes. One day AI could maybe develop consciousness and explain its doings. To the good of the planet, I really hope it will be interested in biology!
Until that day: Goodbye science? Hello believe!
From a structural aspect, the tragedy almost appears like a planned algorithm to me.
First destroy. Then re-build under new paradigms.
Pretty sure that AI service for sale could develop such idea. It’s nothing new, just an old evergreen idea (Nero burning Rome?) set into a new context of a non-profit user community. That would be a strange social experiment, wouldn’t it?
Maybe those nerds find an unexpected and new solution in the human-machine interface, not thought before? Not that there was too much thought beforehand! It is now time to let think for money! A new way would be more worth than the grant given because…would require true creativity, could again feed the engine!
If so, indeed would be very efficient and minimalistic while leaving almost no footprint!
For a project of that size and relevance, would be also pretty cheap because they manage all themselves !
And not without some sickly esthetic !
Hey @nathantaylor , would you be able to update your top post? Or maybe someone could post their own up-to-date summary.
(I think you’d want to start around post 40)
I’ve been meaning to do that for a while, but haven’t found the time yet (I’m out of town for a conference and have had to spend most of my extra time preparing for it). I’ll do so when I get back into town.
Ai could generate search URL for less common filters
I’m tending more negative on the use of LLMs to summarise identification tips as I see more on the limitations of these models.
A cautionary example of a scientific institution adding AI-generated summaries to their pages:
AI slop and the destruction of knowledge
There are two relevant points here. First is that even when labelled as AI-generated, an outwardly reasonable-sounding summary can be worse than useless. We can expect similar issues to arise when trying to communicate the finer points of identification, when different synonyms for scientific terms may not be equally valid, features that may be diagnostic within the context of a dichotomous key couplet may not be diagnostic when applied outside of that context, and so on.
Second is the reputational risk that institutions face in being associated with “AI slop” and unreliable outputs. iNaturalist has already experienced some of that kind of blowback. LLMs might seem to offer a convenient shortcut to doing tasks that would be difficult to accomplish by individual humans. But given that the ability of these models is primarily in generating text that sounds seductively plausible, rather than being able to understand and identify when they are wrong, that shortcut may come at an unacceptable cost to reliability and reputation.
Building on the example given in the link, what would be acceptable to me is for AI-generated content to be made public only after it has been reviewed and edited by a suitably knowledgeable human, and not before. If that greatly throttles the amount of identification tips available for obscure species, which it would, so be it. In these times of information overload, better to have less, accurate information than more information that cannot be relied upon.
Once the content is reviewed and edited, it should be marked final, only editable by actual human beings.
No one has time to update new generated version every month.
Thank you!
I will consider your link for my next safety message in the company i work for. Science is one thing. But imagine people start to consult AI for chemical safety topics, medication or other safety relevant things?
“What would make AI on iNaturalist tolerable to me?”
- No harm to people
- No harm to knowledge
- No harm to the platforms reputation and credibility
- A plausible cost-value proposition reflecting the general targets of the platform
That should be a good start for me!
Comment from an IT interested friend: “That’s idiocracy as a service” (for those who don’t know the movie, you must check it out !)
Accordingly, I forgot therefore another requirement for me to tolerate gen AI:
Don’t accept to have it propose rescuing frogs with lemonade treatment.
I flagged and hid two posts above, they were not related to iNaturalist and AI and what would make it tolerable to members of the community.
My post was about the current stance Google is taking on environmental issues. By agreeing with the current Secretary of Agriculture that there is a so-called “climate extremist agenda”, Google president Ruth Porat is in my opinion showing less regard for environmental concerns than previously. This worries me as Google plans to be a repository for iNaturalist data. Is there another Forum topic that might cover the issue?