What is this - iNaturalist and generative AI?

seems like you generally understand what i wrote. it would bypass specific safeguards, not any safeguards.

is it not accurate to label what i described as an opt-in option? what is a better label for it?

It is an opt-in option, since it’s opting into allowing your data to be used in a certain way. I was mostly meaning that because for a lot of the discussion here people have been wanting the feature as a whole to be opt-in, in other words not having their data be used at all unless they choose for it to be used that way, some people might see that term and assume that’s what’s being talked about. It admittedly took a few rereads before my brain really pieced together what was being opted-into, hence wanting to clarify that I wasn’t also misunderstanding.

I can see what you mean about it this specific kind of opt-in feature potentially causing issues (for example with particular identifiers making up the majority of the training data). I think once again though, if you end up with a taxon where there are only a handful of dedicated identifiers who would end up being almost the entire training dataset, it might just be easier to have something compiled their comments and present them as is, or give them an option to write something that can be put on the taxon page. A lot experts who study very niche subjects are delighted to share their knowledge and will gladly explain things to people who ask.

This is exactly why I don’t need to be credited for any comments I make, which are a synthesis of all my reference materials, not my own intellectual property. Of course I am prepared to provide sources, should an observer ever ask where I got my info (which I believe only happened once.)

Rather than any concern about not being given credit, I am far more concerned about false info. being wrongly attributed to me, a concern others have already expressed.

Here’s a question which reveals the depth of my lack of knowledge about AI:
Is it more likely to hallucinate in the absence of actual data to scrape? For taxa that don’t have many observations, or many observations with comments, is it more likely invent stuff? If it is designed to produce output, will it just do so by any means? (Like a student who is told they have to write three pages on how they spent their summer vacation, when all they did was play video games all day, so they invent things the teacher might like to hear, like books read and trips taken.)

This would be big problem, especially if good observers and identifiers delete their accounts, taking their collective wisdom when they leave.
I respect every user’s right to do that, but I hope that however this all plays out, we don’t see a loss of knowledge that outweighs any potential gain.

5 Likes

i hope this is not too aside the point in the thread- it relates to the importance of sourcing and how GenAI makes this challenging.

are field guides and other scientific literature not also syntheses of their reference materials? they themselves have reference/citation lists. and we cite these syntheses all the time. the IDs you produce are your intellectual property as they are a product of your mind and efforts as an individual.

i do not regard myself as making a wholly original output: we stand on the shoulders of giants. my qualm is not at all that i feel entitled to credit. but citing workers is what we do in formal discussion and academia. synthesis is a process in which we consciously or unconsciously leave impressions of our own experiences/biases/oversights/etc. i would say we have a responsibility to others and to our sources to interpret/synthesize/represent the sources accurately. by listing what those sources are (and with it: who, what, when) anybody can independently verify the fidelity of your interpretation/synthesis. crediting the sources of our information (and those sources citing theirs) has been the basis of what we collectively know. and you are also a source!

to regard the question whether every ID needs to come with a reference list, of course not- many ID characteristics are generalized. we do not cite statements that the sky is blue as this is regarded as a readily observable fact. your name is attached to your ID and people can approach you directly for the reference list if they want to look further into those characters (perhaps they have a different understanding which conflicts, they are skeptical, or they simply want to learn more).

1 Like

I don“t see anything in your comment as to why this is more challenging with genAI (?)
You are just talking about best practice for comments and IDs - whether derived from human or AI.

A well-designed user-interface with an LLM could potentially do a better job than most humans of referencing every ID and comment it makes. The majority of people do not cite references for every single ID or comment they make ( e.g. in the 5-10 I looked at of the observations used by pisum above, there were no references given ). Humans donĀ“t always even remember the source of information for every ID they make or snippet of knowledge they retain… or it might be an amalgam of sources. A genAI with inbuilt sources would be easier to cross-check. Sources should be a prerequisite for any implementation of genAI.

Also, fwiw the existing CV has no references at all for it“s IDs, and nobody has ever challenged this before.

So, I donĀ“t see that genAI would make this specific problem more challenging than the existing system. It would more likely aid it than not. Hallucinated sources may be an issue, but again… this can happen with humans and at least one can easily cross-check and flag these when they occur. I have told others and seen others tell me X because of Y, but then on further questioning realised I was/they were mistaken. Every ID / comment by AI can be supplemented with a word of caution about taking AI suggestions at face value. Human IDs and comments arenĀ“t currently supplemented with any words of caution even when they come from new users with zero experience.

Seeing an ID as IP is a stretch. It“s a statement of fact not an invention.
The comment is different as there is a unique creative element to how you write and interpret the facts.

2 Likes

Please - what is IP ?

I have seen comments - not my iD, blame it on CV LOL. Then pointed out, it is, your name on the ID.

The legal concept of intellectual property.

3 Likes

Google’s AI - google is not paying for nothin your images are going to show up probably intact in someone’s google doc making a flier to sell some scam product

?

With due respect, this seems like a fundamental misunderstanding of the CV.

Do you know how the CV works and its development history? Where and who runs it?

Google has nothing to do with the CV.

1 Like

Hi everyone. A reminder that differences of opinion are welcome, but violations of the guidelines should be flagged and NOT responded to. Thanks. If you’re unsure, please reach out to a moderator.

i’m not sure this solves more problems than it creates. earlier, i brought up the problem of how a human realistically makes use of potentially a flood of identification notes. (even a single identifier for a taxon can generate a flood of notes to sift through).

it also seems that not only are folks sensitive to totally false information being attributed to them, but they are also sensitive to out-of-context information that could possibly be misinterpreted (ex. ā€œcardinals are redā€ is true, but only for adult males).being attributed to them. so i think just presenting a bunch of notes as is (out of context) just exacerbates that latter problem.

i’m fine with this generally, but since the devil is in the details, and since i’m not an academic, please let me know how these situations would be handled by an academic:

  1. suppose an academic A learns a particular fact F from another human H. suppose H subsequently tells A that F was actually discovered by a researcher R who first wrote about F in book B. now A writes a paper that mentions F. i think A would verify F in B and then cite R+B as the source for F because that represents the first time F was noted in literature, right? but would A also cite H as a conduit for that knowledge? in other words, is the academic citation intended to credit only the discoverer (or first documenter) of a fact? or is it intended to credit the efforts of everyone involved in distributing that knowledge?

  2. now suppose R is no longer alive and no copies of B can be found. so A never actually reads B. does A still cite R+B? or does A cite the existence of R+B as recounted by H? or does A simply cite H?

  3. now suppose A moves on to study some event E that happened long ago. the only human records of E occur in an oral legend L that is recounted to A in more or less the same way by each of 1000 inhabitants of village V. A makes recordings and transcripts of each of these versions of L. there is a particular inhabitant S who is noted by the other inhabitants of V as being the de facto storyteller historian of the village, but L existed long before S (nobody knows exactly who first recounted L), and not every inhabitant of V would have originally learned L from S. now A writes a paper that includes mention of E as recounted via L. does A cite S (and the encounter with S) because S is the closest thing to an authority on the subject? or does A cite the first inhabitant of V that A talked to? or does A cite each inhabitant of V individually? or doss A cite the collective inhabitants of V? or something else?

  4. also, in the paper mentioning E, does A include transcripts of all 1000 versions of L? or does A provide only a handful of transcripts of L?