Yeah, I know some will just reference what’s in the prompt without adding that text to the training dataset while others will actually train off it, but I’m not sure which ones do that. Copilot apparently doesn’t use user data for training, but I can imagine it’s still pretty jarring seeing someone use dozens of your comments as a prompt to get a summary that uses your wording and explanations but without crediting you at all.
From what iisips said previously, part of the issue was the lack of credit, therefore compiling those comments and presenting them with their username attached wouldn’t present the same issues. The comments would also likely have more context than what’s provided in the AI summary, and if anyone was confused and needed clarification, having their username there means they can be contacted by anyone wanting to learn more.
I can’t speak for iisips, but personally, if someone wanted to screenshot my comments describing how to identify a species and then compile those comments somewhere either for themselves or for other people to use as a reference, I’d be fine with that so long as my username was visible so people can see who wrote it, which both means I get credit for what I’ve written, and they have a source they can contact for clarification and to find out what sources I use to get that information so they can more easily fact check what I’ve said.
Putting my comments into an LLM and getting it to spit out an amalgamation of what I’ve said without any of the context and without crediting me is both plagiarising my work, and removing other people’s ability to see where that information came from, meaning they can’t ask for clarification or ask what sources I’ve used or anything like that. It’s an all round worse experience for everyone involved.
My main point is that a language model can reference the points of the input it use if it is asked to do that. So, for example, it could credit the authors of the comments if asked to do that. And especially so if specifically trained to do that.
Training or not training from the input is a separate independent issue.
Ah I see, I think I misinterpreted what you’d said initially as being about how it referencing something doesn’t mean it’s been trained on it.
I think while referencing information from a prompt (such as who made the comment) may be an option, at that point it seems like it’d be simpler to just have the comment presented how it was originally written and with credit. That way you’re not risking the LLM adding anything in that would change the meaning or make the information incorrect.
I suppose I just don’t see the point of generating something new if you’ve already got a set of perfectly good comments that can be compiled, perhaps with the ability to click on them and go to the observation they were on for extra context.
An even more trustworthy way a “bot” could help: present a list of relevant comments to visit, nothing more. No error-prone attempts at summarizing/synthesizing, no risk of dramatic context or source omission. Just “if you need tips to identify this species, visit comments here here and here, but please keep in mind that they may not be relevant to your observation or in all locations”.
in the exact same way, yes i would still be upset to a lesser degree. plagiarism is academically scorned for good reason: presenting others work as your own, inability to trace information sources (thus impeding others from verifying information is accurately interpreted and represented), disassociating work from the worker. how my data were used here represents to me a greater degree of violation in that they were provided to a LLM/GenAI that may potentially use these data to train the model.
i will preempt any suggestion that, by listing the observations, the output properly cited me or any other other users sourced for input. and even if it did cite me, i did not want my voice chewed up and spat out: being impersonated without my knowledge is upsetting. upsetting to not know whether the model could be incorporating my words to better mimic my work. upsetting to know it garbles what i intend to say.
and why? what value was derived in this example? i am here and available to talk to anybody interested. i do this because i love these taxa, and i love to talk about them. my agency to communicate about them in my own words was taken from me. my agency to help others was taken from me. it is not convenient for me to wrangle with a variably flawed interpretation of my work being released on the site.
ultimately, i consider my voice all that i have. its important to me that its my own and that i am responsible for it. this example represented to me a very fundamental disrespect of me and my voice.
people genuinely interfacing with me and my comments and learning is not an issue at all. there is a person to communicate with. i learned from studying taxa in the literature and ID discussions. i can cite these sources where i learned. also, i’m fallible. that is one reason why it is so important that workers remain associated with their work. should i make a mistake in my reasoning or interpretation of data, i am responsible for owning and acknowledging it. i can retract my statements and offer clarification/correction.
i would appreciate if you elaborated on your usage of quotes around the term “artificially.” what do you mean to imply?
No such option at this point
I was ambivalent at first about the importance of opt-in vs opt-out, but between @iisips 's comments here and my own concerns about trolls, SEO bots, and misguided novices - any such algorithm should be:
- Opt-IN
- have a minor barrier to entry before your comments are eligible for inclusion, like there is for Projects
- default consent should only be for new comments going forward, not old ones - separate, additional consent for comments made before the AI and new TOS come into play.
Let’s say one day a non-expert iNat user decides to look through all red-shouldered hawk observations with comments and make a list, in his/her own words, of what he/she considers to be characteristics useful for an ID. They then decide to post that list of characteristics, together with an indication of the source observation, on the forum, thinking it could be useful to other users. That list would probably look very much like the list extracted and compiled “artificially” (in other words, with the help of AI) and as the user is not an expert in IDing birds, it might very well contain “mistakes” very similar to (or maybe worse) than the ones made by the AI.
Neither the human intelligence (iNat user) nor the artificial intelligence is claiming the comments as their own, so I personally don’t believe it could be considered as plagiarism. Both cases could also include a disclaimer stating the list might contain errors of interpretation and both provide access to the original text via the observation, enabling all the information to be checked directly with the original source.
Let’s say for the sake of argument, that both lists contain the same information and the same non-expert errors of interpretation… what I’m genuinely trying to understand is whether the non-expert human generated list and the artificial intelligence generated list would both be equally upsetting? Or does the use of AI make the results intrinsically unacceptable, even if the outcome is the same?
I suspect that the heart of the issue is about consent. Consent is built on trust.
A computer program can “ask for consent”, in a legal-technical fashion, but it’s difficult to trust a corporate entity. They don’t take “no” for an answer, or they pretend they did and lie about it. Yes, there are exceptions, but the rule still stands that a computer program is not a person with emotions, ethics, etc.. A computer cannot be held to account, and its programming can be changed at any time by nefarious outside forces. From a legal standpoint we have to grant our consent dozens of times a day merely to exist online – but is it real consent? Or consent under duress?
I trust iNat to be better than that, of course, but not because I put any trust into an algorithm – rather because I hope that while the team is creating, managing, and defending that program, they will respect moral principles and will respect users’ consent.
Consenting to a human doing something is an entire world of difference. I’m willing to trust that people, in general - and especially scientists or students - are inherently curious and well-meaning. They’re a person I can talk to, rather than a mysterious box of circuitry (or worse, a mysterious box of circuitry that puts on a human face). And if it goes poorly, humans can be held accountable. They feel shame when they’re called out, or at least suffer social consequences when they cross boundaries and break consent. In extreme cases, the law can get involved.
To use an analogy, in my opinion this is a bit like this: say I created a painting. A kid takes a picture of my painting because they love it, and draws or traces an exact copy to put on their wall. Simultaneously, a large corporation makes an unauthorized scan of the painting, prints industrial quantities of it, and hands them out to anyone and everyone with no credit or indication that they did not originate the work. There is a difference in these two scenarios wouldn’t you agree? Scale, intent, credit, these all play a role in how the creator would feel about this. And the law agrees with me, one of these is fair use and the other is not.
And even if you ask it to credit specific indvidiuals when citing them, we’ve seen just how that would turn out in this very thread. It confidently ascribes sentences and sentiments to people who never even said them. What if the AI leaves a tip on an observation that says “According to user hawkparty, you can identify this mushroom by its light color and pleasant taste”, and the poster gets hospitalized because it was actually a poisonous mushroom. Am I liable now? Do I have to worry about getting sued? These aren’t things I want to have to worry about on a website I use for fun.
In addition, this reminds me of a conversation I had with a colleague when Copilot and other similar “code generating” AIs were becoming a topic in my professional field:
He said that in his opinion, the code that the AI generated was about on par with the work of an intern or a particularly green junior programmer, in that it needed some work to be made functional but could be used for basic tasks. The difference, according to him, was that you hired interns and juniors with the expectation they would learn and grow. You don’t hire them for efficiency, a senior engineer could write all the same things twice as fast and without the need to clean up and fix the work, the reason you hire them is because over time they improve, they take feedback from the seniors, and they accumulate knowledge, so that they can begin to tackle more complex tasks and even start designing and architecting, two things which GenAI has remained absolutely abysmal at. They eventually become senior engineers who can then train more juniors in the same way, thus is the cycle of tech dev.
The AI was, in the exact words of my college, “an eternal junior”. It never learned or improved and it didn’t take feedback, it just kept generating the same kind of crappy code that required more work to clean up and fix. There was no point to it when plenty of juniors already exist(and that it takes away from their opportunities. If the AI is doing all the junior work, how are they supposed to get experience to become seniors?). The reason some companies have gone ahead with implementing these is because of the rather straightfoward concept that you have to pay juniors and you do not have to pay the AI. So it’s cheaper to have the AI churn out the work of a couple dozen juniors and then have one senior engineer paid to fix it.
However in iNaturalist’s case Robo here was already providing their work for free. And they aren’t compelled to continue offering it. If they decide to leave, there is no senior on the payroll who can fix the AI’s output. It’s just going to continue spewing incorrect and poorly sourced misinformation without ever learning like a person might and nobody will be around to correct it.
Also, in the case of people, they usually have some unique perspective or ability they bring to the table. Maybe that new identifier speaks a language Robo doesn’t, so they can translate the tips to reach more people. Maybe Robo points them in the direction of some of the sources they use and that identifier discovers something new that’s been overlooked. And in time they can become a new experienced identifier that also provides knowledge to others. If these people leave, regardless of how “valid” you think their reasons for doing so are, you get the “garbage in, garbage out” problem. The GenAI continues to produce slop because all the skilled users have removed their data from the training pool and left, and in turn all it has to take in again is its own slop. You can imagine the end result.
I see a lot of AI enthusiasts constantly espousing that it doesn’t matter if people don’t like AI, because the AI can just replace them. But what they miss is that without these people, the AI has nothing. It is trained on their work, all it can do is reproduce it. The feelings of these people matter because as they leave, as they remove their data from the internet or refuse to publish it so that it can be scraped, all the AI can do is continue to subsist off its own slop. You can’t make a device reliant on the labor of others and then claim that it doesn’t matter how they feel about it because you can just replace them. iNaturalist should take note of that.
(And, only marginally related, I want to bring back my liability questions. Disney has recently initiated a massive lawsuit against GenAI company Midjourney. Does iNaturalist really want to open itself to this kind of legal action? Even if they end up being validated by a court of law, that’s a protracted legal battle that I frankly doubt iNaturalist can afford)
Given the number of posts in this thread, I’ve given it a week before commenting here hoping that most of the ‘steam’ has been blown off by now. I have tried to read through it but probably just glossed over a lot of it. Like several other people I’m surprised by the overwhelmingly negative reactions to the news. I do not envy the iNat team the task of trying to sort through and address all the concerns being voiced. It must feel like they’ve accidentally tripped and fallen into a hornet’s nest.
1 - This is going to be long, so I’m going to sort it roughly into three topics, hopefully not repeating too much of what has already been said above. First, I’d like to point out how the AI risk debate and political decisions regarding regulations differ significantly in the US vs. Europe. This is probably something to be aware of going forward with any sort of AI applications. Preparing to teach a discussion class on genetic engineering, I can see a lot of parallels with the anti-GMO debate, e.g.:
- New technology that few understand and thus a lot of distrust against it
- Deep concern about negative effects on the environment
- Association of that technology with “greedy corporations” (Monsanto-Google-etc.)
- Differences in safety assessment and regulations between the US and Europe
- Objections against being “opted in” to an experiment by default
To illustrate that last point, foods containing GMO products have been on the market for decades but weren’t required to be labeled (as containing bioengineered food ingredients) in the US until 2022. They are still prohibited in many countries in Europe. Sometimes when we talk about this in class we refer to it as “the great American GMO experiment” and there are always sentiments that consent to be part of it was overruled by political and economic interests.
Similarly, internet users in the US currently have their social media content scraped by AI, often without their knowledge or consent, and are unwittingly/unwillingly becoming part of “the great American AI experiment.” For example, German news raised awareness of the possibility to object to having your public Meta (Facebook, Instagram, WhatsApp etc) posts used for AI training. I was able to find the corresponding URL in the privacy settings for my US Facebook account but it just said: “This form is only available to people in certain regions.” US users have all been opted in to AI training by default.
I expect these differences in AI consent and privacy options between the US and other places are going to become even more pronounced in the near future with diverging regulations driven by political and societal differences. They definitely feed the “greedy corporation” and “big brother” vibe of this technology in the US. It’s no wonder many people are wary of AI being forced on them in a way that constitutes a power grab by big corporations and leveraging it for political gain.
Operating from the US but with an international user base, I think it will be necessary for the iNat team to be aware of these differences and concerns. In some other parts of the world, information considered a “free-for-all” in the US might be protected by law. As already suggested, be transparent about its use on the platform and provide options for people to give consent to having their data used.
2 - Secondly, I’d like to reflect on the benefits of keeping at the forefront of AI research and development. I’m sure this is really what the demo to be developed with the help of the grant is about: Creating new tools (rather than applying existing ones) to make iNat even more useful for human users. As several people on this thread have done, anyone can already feed a bunch of comments into ChatGPT to generate a summary. That doesn’t require a grant to do and yes, the output will likely be flawed. Could it be improved? Maybe - or is there an entirely different way of approaching this? Grants are usually for thinking outside the box and developing something new that doesn’t exist yet. That is the whole point of research and development.
We are living in the Information Age where available data is still growing exponentially. AI is a game-changer in information processing, no doubt about that. I still remember as a graduate student taking pen and paper and a genetic code table to manually translate the ~200 nucleotides I read off the autoradiograph of my sequencing gel into amino acids so I could type those into the BLASTP algorithm to compare against the GenBank database on a set of CDs (no internet). Gosh, how things have changed! I’m thinking of all the DNA sequence data generated in our current lab projects and how we’re using AI to generate R scripts to analyze all that data and put it into diagrams and figures. I’m glad AI can help us with that!
Generative AI is inevitable at least in my job today and refusing to engage with it might backfire. E.g. our students have already wholeheartedly adopted it to help them with their homework assignments. Suddenly instructors are faced with having to at least address AI in the classroom. None of us opted in to this, it just became reality. Our students will likely need to know how to use AI for their future careers. A lot of us are taking workshops, playing catch-up. To quote Will Rogers: Even if you’re on the right track, you’ll get run over if you just sit there. We could debate whether using AI is the “right” track but currently that’s the track we’re sitting on. It will definitely be a subject of critical discussion about its uses and pitfalls as well as applied in the classroom in the courses I teach this coming fall.
One thing I’ve learned so far is that AI applications can be customized specifically for the task you want them to perform. The digital version of the textbook now comes with an “AI tutor” that was trained to apply the Socratic Method, replying to student questions with more questions prompting them to engage with the material and find the answers in the book rather than giving them a straight answer. Some of us are already using AI trained to write exam questions based on class materials or categorize written feedback from students in large enrollment classes. Generative AI can take many forms, it’s up to us to tailor it to our needs and understand its limitations. I’m curious to see the demo the iNat team will come up with!
Based on how frequently my students use it, I think it’s safe to say AI is here to stay. On iNat, I see a lot of people using the CV suggestions, especially those vast numbers of infrequent users who just want to put a name on things. I see this crowd of casual users as a prime target for the proposed tool. Maybe that could even lead to more public awareness of the community providing identifications on iNat. And I’d much rather see that effort be spearheaded by the iNat team than anyone else.
3 - Which brings me to my third point: I have confidence in the integrity and skills of the iNat team to use the grant money to further iNat’s mission and that they won’t compromise or sell out to “big tech.” My confidence is rooted in the fact that iNat spent its incubation time sheltered within academia rather than as a start-up funded by venture capitalists and was converted into an independent non-profit organization rather than being sold off to the highest bidder. I’m also thrilled that the original founders stuck around on the team and obviously care a great deal about iNat’s mission and community, as well as use the site themselves. I appreciate the level of engagement with the community. That’s a big difference to some other sites that have gone down.
Community and communication is really key to sites like this. My online photo sharing story started on Flickr, where I’ve experienced firsthand what happens when big companies take over an online community and prioritize profit. As the saying goes, on the internet if you’re not paying for a product you are the product being sold (typically to advertisers). The years when Yahoo redesigned Flickr and converted it into a free-for-all photo storage site that was supposed to be generating enough ad revenue to keep an unsustainable business model afloat pretty much turned a once thriving community into a ghost town. My first actions on iNat were to import all my nature photos from Flickr in case it goes down for good and forever.
Well, Flickr is still around today and I dare say experiencing a bit of revival within the photographer community. What happened? It was sold it off again to a company experienced in catering to photographers and thus familiar with their needs and expectations (SmugMug). It regained its focus on paid photo sharing. I believe the reason Flickr still exists is because the people now in charge of it care about its community of photographers and provide them with the tools they want and are willing to pay for. In parallel, iNat is still in the hands of people who are passionate about connecting people with nature, without the painful detour of being bounced around between businesses primarily interested in ad revenue that Flickr took.
Unlike Flickr, iNat is not tied to a business though and there are no ads or paid accounts, just an occasional ask for donations to help keep the servers running. This brings me to my last point, which is the need to pull in grants like this. Anyone remember ARKive? It went offline in 2019 due to lack of funding. Non-profits like this absolutely depend on funding in the form of grants and donations. So I’m thrilled to hear that iNat was able to secure another grant! I have no doubt that the iNat team will make the best use of it while keeping the concerns of the community in mind.
That’s called plagiarism.
In case you hadn’t noticed, when we quote someone here in the Forums, their username and avatar appears at the top of their quoted words. You can see yours in the part of this post that quotes you.
Didn’t we all learn about plagiarism in school?
replying to express that i feel the replies by @astra_the_dragon and @hawkparty get around some of my thoughts here. i disagree about the point of plagiarism as i see important reasons for citations to not be mere lists of links- but that is not the major focus here so let’s put that aside. let me attempt to explain the difference i am seeing here.
“Let’s say one day a non-expert iNat user decides to look through all red-shouldered hawk observations with comments and make a list, in his/her own words, of what he/she considers to be characteristics useful for an ID.”
(emphasis mine)
the hypothetical individual is applying their perspective in using their own words and weighing what they consider useful for ID. as an individual, they approach the task with their own experiences and biases: the way they transform the source material into their own words and what they weigh as important is going to differ from another individual. the output will have their fingerprints on it. they are responsible for the creation of that output and answering for what it contains. they individually own the process, what they learned from it, and the product.
but what of the generated output of a LLM? it is by its very nature Not an individual. we cannot challenge it to explain its interpretation or considerations because it did not interpret or consider things the conscious way we do. when you ask it to justify itself, the output of its answer is another blackbox process that we cannot verify accurately represents its prior “thought process.” you could have it repeat the process again and again and it has no responsibility to be consistent. it has no perspective. it has no context besides your prompts. it has no responsibility to learn- users must bear the responsibility of dealing with its repeated errors.
but it all continues to spiral in to the central question of why?
- for education? i would much prefer the hypothetical case of an individual genuinely (not artificially, via a LLM process) engaging with my comments. while transforming something into their words and expressing their perspective in what they choose, they are learning and developing themselves, their skills, their voice.
- is it for convenience? i do not consider it convenient whatsoever to be left constantly alert and spending my time correcting the outputs. as @hawkparty noted, why would i ever feel compelled to spend my time training an “eternal junior”? that in the next output it writes could very well just make the same mistake as before?
i have expressed already that i would rather have the ability to write this summary myself in a form accessible by the community. the use of LLMs/genAI removes me from the discussion in a way i do not enjoy. i want it to be abundantly clear that i love learning, i love my subjects of interest, i love engaging with others while they learn and perhaps love the same subjects. by discussing with others of any skill, i am considering their individual perspective. this pressures my development too as i consider what they see, what they interpret, what they weigh as important.
it isn’t that i fear being replaced by LLMs/genAI. i do not believe i can be adequately replaced by these tools. instead i fear people looking at the outputs trained on my effort next to my original effort and judging it as “same thing.” an LLM/genAI process can do its damnedest to mimic me but it isn’t me. and i want to be me. i want to be me and a part of the process because i derive joy from participating and being in the community. i beg not to be shut out of this.
i’m sorry. i’ve deleted my earlier post.
…
just for context, here are relevant points from that deleted post, with a bit of redaction:
i wanted to see how an AI might handle providing sample observations along with each feature. so i extracted XXXX notes from the 10000 most recent [taxon] identifications. again, this browser version of Copilot limits my input to ~10000 characters. so i fed it [a smaller portion of] of the XXXX notes and gave it this prompt:
using only on these identification notes for [taxon] from observations (obs) in iNaturalist, find relevant features that may be useful for identification, and provide up to 3 example observations for each feature:
and here’s how Copilot (based on a descendant of GPT 4) responded:
Here’s a structured summary of useful identification features for [taxon], along with example observations:
Identification Features & Example Observations
- [Feature 1 (3-4 words)] – [brief elaboration (5-11 words)]
- obs 111
- obs 222
- obs 333
- [Feature 2 (3-4 words)] – [brief elaboration (5-11 words)]
- obs 444
- obs 555
- obs 666
[…]
- [Feature 10 (3-4 words)] – [brief elaboration (5-11 words)]
- obs 777
These features provide strong identification points for distinguishing [taxon]. Let me know if you need further refinement!
i can imagine that instead of providing example observations as a bulleted list like above, iNat could present a series of small photos from each of the example observations, and users could click on interesting ones to see the observation details.
…
there’s already been thorough discussion of many of the points you’ve made. so i won’t try to add to most of it. but i do want to expand on a few things:
consent.
to some extent, i think it’s important to judge the AI (the tool) separately from the how it’s used. it’s true that AI can be used for bad (or poorly), but it can also be used for good (or well). here, i knew that folks were concerned about topics related agency and consent, and yet i produced my example in a way that was insensitive to those concerns. so i own that mistake. it was my own poor judgment / usage, and i’m sorry.
credit.
here, again, i think it’s worth judging the technology itself separately from its use. in my example, i explicitly asked the AI to tie features to observations. i purposely made that choice in an attempt to avoid earlier concerns about bad information or out-of-context information being tied to indviduals (causing reputational harm or exacerbating misinformation). one of the things i was testing in my example was whether AI could tie concepts to sources at all. as far as i can tell from this this and other private testing, there’s no reason to believe that an AI wouldn’t be able to refernce individuals and observations, if it had been provided such information and been asked to do so. unfortunatlely, my attempt to avoid one problem led to another. not crediting individuals in my example is my mistake, and i’m sorry.
voice.
i think there are many ways to think about “having a voice”. one of them is adjacent to consent, and i think another could be related to credit. so i won’t try to address those again. there are other potential meanings, but one that i specifically want to address is related to how an idea is expressed (which i’ll just call “expression” for short). i do think appropriation of expression is something that is a challenge inherent to generative AI technology itself, and one of the things i’m interested to see in the demo is whether it can satisfactorily mitigate that sort of problem.
at some level, there are only so many ways you can express a factual concept, especially when that expression is done in fewer words (ex. “cardinals are red or brown” vs “cardinals are red (male) or brown (female)” vs “adult male cardinals are brilliant red, while females are a duller brownish-gray”). so maybe one way to mitigate the expression problem is to limit the length of output, at the sacrifice of context.
another way could be to present only the bits of information (ex. individual identifying features) that, for example, had been conveyed by a minimum X number of users and / or by limiting any single person’s input to some X% maximum of the total input for any given concept. i think the tradeoff there could be the exclusion of certain uncommon but interesting characters (ex. i’ve noticed in my private testing that my AI will include things like examples of aberrations or examples of signs or other things that aren’t identifying features exactly but are adjacent) and also the potential exclusion of information about taxa where only a single helpful expert exists. to me, cases of the latter are sort of the gems in the iNat mine. i can envision an opt-in being useful here to allow those gems to be mined in cases where they otherwise would not be, although i can also see such an opt-in option being abused, too.
anyway, that’s a long way of getting to the idea that it’s probably possible to mitigate appropriation of expression by generative AI, but i didn’t try hard enough to do that in producing my example. so i’m sorry for that, too.
Thanks to you all for your thoughtful answers, I believe I now have a much better idea of your doubts and concerns.
This morning I woke (too) early, so just for fun I got “chatting” to CoPilot about the differences between human and artificial intelligence. I won’t go into the details otherwise I risk triggering an off-topic debate, but I believe two of the AI’s conclusions to be worth quoting (my bold):
-
AI “can process massive amounts of information quickly, find patterns people might miss, and learn from data. […] When paired with human values and oversight, AI has the power to solve complex problems and open up new possibilities”
-
“While AI offers tremendous promise, it also demands thoughtful governance, ethical design, and public awareness”.
That is exactly how I see it. AI is (for me) an extraordinarily powerful tool which (like any tool) can be used well or badly, for good or for bad. The issue (for me) is not whether I trust that tool, but whether I trust the people responsible for the conception, design and implementation of that tool. In this particular case, I have absolute faith in the iNat team to do everything in their power to implement “thoughtful governance, ethical design and public [the full iNat community] awareness”.
Unfortunately I have absolutely no relevant skills and am far too old to learn them, otherwise I would find it absolutely fascinating to work on this demo project (if it goes ahead after so much negative reaction). But obviously that’s just my very personal opinion.
What would it mean for an opt-in option to be “abused”? How could someone choosing not to participate in something be “abusive”?
you need to read the whole thought, not just the part that you quoted. the opt-in option contemplated in that thought is a very specific one that would allow folks to bypass potential safeguards. the potential for abuse exists not in the absence of invoking the opt in. instead, it exists when opting in because those potential safeguards would be removed.
Most of the time (maybe all of the time?), when I make a comment about identifications on an observation, I am using what I learned from a printed field guide or a class I took or a website, but usually I’m not crediting that book or teacher or website. Should I start doing that? Perhaps something like, "In eastern North America, there are two native species of elderberry recognized in iNat’s taxonomy: Sambucus canadensis and S. rubens. They are most easily distinguished by the shape and timing of their inflorescences, and by the color of ripe berries. See Go Botany for more details (but be aware that Go Botany lumps European and American Black Elderberries, unlike iNat): https://gobotany.nativeplanttrust.org/dkey/sambucus/ " Or should I just agree with the correct IDs, correct the wrong IDs, and get through quickly the dozens of Sambucus observations that get added every day at this time of year? I do the latter, obviously, unless someone asks. I think that’s the path taken by most identifiers, but should we all be citing our sources?
May I ask for clarification on what this “opt-in” option would be? Usually when people talk about having something be “opt-in”, they mean an individual can choose to have their data included, and if they don’t choose to do that then they aren’t included.
Would that be what’s happening here? To me it sounds like what you’re describing is an option that would bypass any safeguards like “a single person can only make up this percent of the training data” to allow comments to be used in cases where there aren’t many identifiers. I think if that’s the case, the confusion over the option being abused is because the term “opt-in” usually has a different meaning for most people when talking about opting into AI training on their data.