It’s true that we humans can mess things up. The thing is, I don’t think we need the aid of AI, which can mess things so much more quickly and thoroughly.
why do you think it’s inevitable that aiding humans with AI will result in messing things up? why is it not possible that it could make things better?
It’s hard to estimate, yes, but I have not seen enough evidence to believe that it’s less than people think. In the thread above there is one Nature article looking at Meta Llama (Llama-3-70B/Gemma-2B-it) models. The comparison to humans is questionable to me, but that doesn’t mean it isn’t as bad as we thought. There are some questionable decisions on their methodology imo, e.g. by assuming a human will write 300words/h but the LLM is will only need a single prompt to arrive at “comparable” results. Note this comment:
We acknowledge that LLM performance may vary across different content types and that our analysis does not account for qualitative differences in output between LLMs and humans. Thus, our study aims to provide a quantitative comparison of resource utilization rather than a qualitative assessment of content.
The human resource consumption is rather unfair, e.g. water consumption is based on the daily water consumption of a human prorated for the 1,67 h it takes them to write the benchmark task + the amount of water needed to generate the electricity at that time. This is regardless of wether the water consumption can be attributed to such a task.
The results there are for specific types of models. Heed these warnings:
However, the growing model sizes driven in part by the scaling law (e.g, recently released Llama-3.1-405B16) will likely increase the energy consumption and the associated environmental impacts of LLMs substantially.
Despite the potential efficiency advantages of LLMs compared to human labor, we emphasize that our analysis is not intended to derail the ongoing efforts to curb LLMs’ own large environmental footprints.
What is more, by comparing the efficiency to human labour, the only sense of scale is that LLMs are “efficient” (allegedly) is if you want to substitute humans. Yet we will continue to exist, even if the user prompter isn’t counted on the energy consumption of the LLM:
presenting a comparative assessment of the environmental impact of LLMs vs. human labor, examining their relative efficiency across energy consumption, carbon emissions, water usage, and cost.
In this article they go over the cost of training and inference of GPT-3 (Preprint), and they also mention this:
As acknowledged in Google’s sustainability report9 and the recent U.S. datacenter energy report,25 the expansion of AI products and services is a key driver of the rapid increase in datacenter water consumption
The Washington Post reported on GPT-4(2024-09-18, so not the latest models) (Archive.org version) using he same methodology as the paper above. That’s the famous 0.5 liters and requires 0.14 kWh of electricity (about the same as using 14 LED light bulbs for 1 hour) per 100-word query.
I don’t think this discussion is productive anymore. There probably isn’t anything new to be said. Especially at this point, where most of the comments are based on speculation.
Personally, I feel like staff have addressed all major concerns that can be addressed in these early stages, and unless what they have said was an outright lie (which I really do not believe), then there seems to be little to worry about.
I don’t know anyone on the iNat team personally, but over the years on iNat and the Forum none of them have made the impression on me to be greedy persons only intent on lining their pockets. Many of them are active iNat users as well. So I completely trust them that they know what they are doing and won’t ruin iNat.
P.S.: I also think a lot of the negative attitude towards the technology of GenAI is largely misplaced. AI Slop, copyright violations, etc. aren’t inherent problems of GenAI, but rather stem from corporate greed and the companies’ (Meta, Google, OpenAI, etc.) disregard for ethics. If anything, that makes it easier for them to deny responsibility and continue their bad practices.
A forum specifically where people can post under species/genus pages about ID’ing would be an invaluable tool for both new and experienced users.
Yes! I think I wanted to suggest something like this a feature request but I was forwarded to some discussion about the generative AI collecting users’s notes… that wasn’t my intention because my request was for human-made guides.
Yes, absolutely. People would b ehappy to do that. I know I would.
I like guides such as opening a taxons and reading: Bougainvilleas have spiky stems, pink flowers in X or Y shape, etc.
So I can read that and see if my plant has spikes, or if the flowers have that shape.
But in the taxon page, and written by humans of course.
In parts of the developed world, budget-tight companies and administrations are already remarkably tolerant of (…or even explicitly after /grin) careless human work when it comes to enviro stuff, provided it’s cheap. And we’re mere months away from free machine-generated slop becoming “good enough” to substitute for that, thanks to the dedication of many.
Considering how a few students, interns, and pros alike already relied discreetly on iNat (or better, unpaid minions called “identifiers”) to do the boring, tedious, tricky tasks in their stead… and how these same “identifiers” could soon be kindly asked to do the training and fixing for a new batch of AI – thus helping iNat remain in the tech race and popular and funded…
a minimal reward (beyond the warm fuzzy feeling and some impersonal ‘thanks to our great community’) is perhaps not such a ridiculous idea – even if not feasible in practice.
Thought everyone might like to know someone’s made a video about this, with over 6k views.
“Google bribes iNaturalist to use generative AI”?
You don’t find it ironic that a video criticising generative AI with one of the concerns being accuracy/disinformation peddles blatant disinformation?
I think that sometimes AI will make things better, or at least will accomplish tasks faster. But is that worthwhile? Maybe, if people check the work. But isn’t the point of having AI do the work that it permits one to eliminate having humans do it?
One way I experience AI is by using Google translate. It seems to do a good job most of the time. With highly technical terms, especially if the same word is used in ordinary writing, it makes laughable error. (Believe me, if I can catch it, the error is laughable.) Especially interesting is when it translates scientific names.
I hear from friends who teach that essays generated with AI are pretty easy to detect (but essentially impossible to grade as zero because of university policy). I’m OK with scientific writing having a flat, uninteresting style, but the errors are a problem.
I’m OK with AI doing a secondary but potentially useful task, like compiling descriptions and identification hints. But for identification itself, I think the suggestions from CV are enough AI. And if they’re not, if you want to use AI for identifications, what’s the point of having humans in the loop?
Sorry I’m not more coherent. I’m tired.
regardless of whether an initial identification suggestion comes from the existing computer vision or some other non-human thing, the reason for having humans in the loop in the context of iNaturalist doesn’t change. first, i think that, philosophically, iNat overseers still want people to be the ultimate deciders. second, iNat has always fundamentally been a social network, where one of the primary social interactions is identifying. (you meet other people through the identification process.) third, with each identification interaction, every person involved has a chance to potentially learn something new. (every person comes with a different view of the world. maybe one person only knows about things that the CV has suggested, maybe another person has deep hands-on experience with particular taxa, maybe another person has broad knowledge about the organisms in a particular place, etc…)
that said, at this point, we don’t know exactly what the iNat overseers will attempt to create in this project. (when i say we don’t know, it’s quite possible that “we” still includes iNat staff.)
Google seemed to characterize their expectation of the deliverable as something that “Enhances biodiversity data by converting thousands of identification remarks into natural language explanations”. that could be something as simple as a supplementary compilation of identification hints that you say you’re OK with. or it could be something much more complex like a vision language model that effectively could do what the existing computer vision does, plus additional stuff. there’s nothing to say that staff couldn’t do both of these (allowing a user to choose how much help they would like), do something in between, or do something else entirely.
i would guess that if anything did come out of this project that would be made available to the masses, it would be something like the supplemental identification hints. i would guess that there’s just not enough data, and the technology just isn’t available right now to create a competent vision language model that iNat overseers would want to make available to the masses.
but, conceptually, i don’t see why – assuming it could be done reasonably – it would be a bad thing to provide users the option to quickly get some additional insights beyond just a list of computer vision suggestions. can you elaborate on your concerns here?
just to help visualize a few different ways that this could be approached, below are 4 responses that i got from CoPilot when i fed it an image (of what i believe is an Asian Tramp Snail with a Priate Spider on its shell) with various prompts. (i haven’t made an observation out of this image yet, but it’s extracted from a video posted here, if you want to see it in more context: https://youtu.be/ZR0e6BUfpNY.)
-
Copilot provided some general features of snails and also (without prompting) pointed out the spider, too:
-
Copilot provided an incorrect (but not unreasonable) identification of the snail (and words its suggestion in probably an overconfident way), but it does provide “reasons” for its identification that folks could adapt into a process for verification or for making a better identification:
-
Copilot provided an incorrect (but not unreasonable) identification of the spider, but again provides context that could help someone get to a correct identification.
-
I provide the identification here, but Copilot provides additional information for that species, indicating which features are visible or not in the photo:
Trying not to get pulled into this argument…as I´m travelling and iNatting! … but :
This feels like disinformation to leave this hanging in this thread at this point.
See Andy Masley posts linked above by me and @upupa-epops …which in turn pulls from MIT review amongst others.
The upper bound is more like 3wh so 0.003 kWh per prompt
A fraction of 0.14kWh
See Andy Masley specific breakdown of why Washington post number is an outlier here.
Not dug into this, but I see there is a nice interactive HuggingFace tool around energy usage per prompt also : https://huggingface.co/spaces/jdelavande/chat-ui-energy
I´m also gonna drop in some of Andy´s lovely visuals as everyone seems to want to keep continuing with the environmental impact trope regardless of all the information we have to counter this argument. Maybe the quick and easy on the eye comparisons will help.
All these graphs are from here.
That’s not disinformation, it’s a scientific paper that can be improved upon. There’s another preprint with numbers closer to yours (depending on the model): https://arxiv.org/abs/2505.09598
But it hasn’t passed peer review yet. And while some GPT-4 models are in the range you say, it also notes that some models are actually much worse. There are many factors that influence how much resources are used by these models, but the fact is that there isn’t enough transparency from genAI companies, so all we have are estimates. I’m disinclined to trust the opinion of someone from Effective Altruism, especially if that’s the upper bound that they conveniently found (or rather chose):
The reason I chose 3 Wh as the number for my post wasn’t that I think it’s the definitive final answer for the cost of a ChatGPT prompt, it’s that every serious attempt at estimating the average ChatGPT prompt’s energy use I’ve seen finds that it’s below 3 Wh, so I ran with that as a reasonable upper bound.
I would take those plots with massive a grain of salt. The numbers are suspect and I don’t see a clear provenance from some of his assumptions. And even if they were sound, I don’t see how comparing to other activities is relevant. Is the use of genAI going to decrease or stop those activities? How many queries will people actually make in the same time frame that they would eat a hamburguer? Trying to shift the consumption of resources to individuals is disingenuous when the problem with these models is their so-called “efficiency”, which they achieve at scale (the amount of queries and the number of users, mainly).
To me one link from one techbro is not sufficient evidence of the climate impact being overblown, unfortunately. Do you have a peer reviewed source to back the claim that it’s not as bad?
I mentioned the MIT review and linked to his sources in my thread. Maybe it was a bit buried :
When he goes into the direct information from the MIT review he uses this paragraph :
Do you have a peer reviewed link to the 0.14kWh instead of the Washington Post?
Again, trying not to be pulled into this right now so limited time to dig personally, but Andy Masley said he couldn´t even find mention of 140wh in Ren´s actual papers .
But anyhow, the link you yourself added gave 0.43 Wh for a short prompt. This is massively lower than the 3Wh upper bound Andy Masley uses, and a huge huge discrepancy with the 140Wh you mention / the Washington Post suggest which is where the vast majority of anti-genAI viral posts on social media seems to have come from…( and in turn, a significant amount of the bias in response to the iNat grant ).
Also, fwiw… I don´t see that a genAI implementation would even necessitate the equivalent to a ChatGPT prompt. Once an ID tip was made for a species, it could be stored and repurposed. Again, lets wait and see the demo.
So again, I see no evidence to suggest this will be more energy intensive than the existing CV or costs of data storage per obs and all the other energy costs associated with iNat. Not that I have any data to back that up but… my hunch is that it´s similar. I hope iNat can quantify all these things down the line. Again, lets wait and see what this shows.
To all those screaming…why compare against the CV again?!
If the only arguments people against the grant have are things that already exist within iNat under CV… or are arguments that would be integral to iNat even if it were entirely human-lead without any AI… then the arguments that genAI itself is specifically a problem fall flat and reveal the deeper bias in play from those arguing against it´s usage.
Cherry-picking data is not great, because I also said:
Such as:
GPT-4.1 nano remains the most efficient overall, requiring only 0.454 Wh for long prompts (approximately 7,000 words of input and 1,000 words of output). In contrast, o3 consumes 39.223 Wh, while DeepSeek-R1 and GPT-4.5 consume 33.634 Wh and 30.495 Wh, respectively, which is over seventy times the energy use of GPT-4.1 nano. To contextualize, a single long query to o3 or DeepSeek-R1 may consume as much electricity as running a 65-inch LED television (≈130W) for roughly 20–30 minutes. Although o3 and DeepSeek-R1 rely heavily on chain-of-thought prompting, GPT-4.5 stands out for its relatively high energy use, despite not being a multi-step reasoning model. This suggests inefficiencies rooted in model architecture.
Claude-3.7 Sonnet ET presents a notable exception. While it supports chain-of-thought reasoning, it consumes only 17.045 Wh for long-form input, which is less than half the energy of o3. Similarly, GPT-4o, OpenAI’s current default model, demonstrates strong energy efficiency, requiring just 1.788 Wh for long prompts and 0.42 Wh for short ones. Interestingly, GPT-4o mini, although substantially smaller in parameter count, consumes slightly more energy per query than GPT-4o due to its deployment on less efficient A100 hardware instead of H100s or H200s, illustrating that deployment infrastructure can overshadow model size in determining real-world energy use.
I just shared one, here’s the link again: https://arxiv.org/html/2505.09598v2. I suggest reading the full article when you’re back from vacation. I’m disengaging from this discussion as I don’t feel it’s being done in good faith.
For the staff: The preprint I shared has insights about the deployment infrastructure and their efficiency, and how that impacts resource consumption. I’m not sure how much is within reach for you, but the discussions there hopefully help you make informed decisions about models and hardware:
Our findings indicate that infrastructure is a crucial determinant of AI inference sustainability. While model design enhances theoretical efficiency, real-world outcomes can substantially diverge based on deployment conditions and factors such as renewable energy usage and hardware efficiency. For instance, GPT-4o mini, despite its smaller architecture, consumes approximately 20% more energy than GPT-4o on long queries due to reliance on older A100 GPU nodes. Similarly, DeepSeek models exhibit disproportionately high water footprints, not solely due to model characteristics but due to data center inefficiencies. These observations suggest that true sustainability will depend on integrating more efficient hardware, sustainable cooling strategies, renewable energy sourcing, evaluation practices, and deployment infrastructures.
The discussion about energy use is interesting, but irrelevant to the potential use of AI on iNaturalist, because iNaturalist is not going to be using ChatGPT or other commercial large language models, and is not going to be generating responses on demand. Up to now, they’ve done the training for AI (computer vision and geomodel) on three computers.
Comparisons to hamburgers are unhelpful, since beef is literally the most environmentally damaging food we produce, so almost anything will look better than a hamburger. It’s difficult to believe arguments that genAI in general doesn’t have a big environmental impact when we see numerous news items about how coal-fired power plants are being kept online to deal with the anticipated surge in electricity demand. Sure, individual queries do not use much energy, but when ChatGPT is already the fifth-most visited site in the world (ahead of the site formerly known as Twitter), and with plans to integrate genAI into every aspect of our lives, the energy demand of AI is growing rapidly. Even with current energy use of genAI equivalent to that used by many thousands of home per year, the MIT Technology Review article* makes the important point that “These estimates don’t capture the near future of how we’ll use AI.”
I despise having generative AI forced down my throat and disable it wherever I can (I use DuckDuckGo rather than Google in part because you can just switch the AI nonsense off). My greatest concern is Google – one of the worst tech companies in existence – getting some cred just by being associated with the “nicest place online”. Be careful what company you keep, iNaturalist.
At the same time, I’m willing to give iNaturalist staff the benefit of the doubt. It’s entirely possible that the tool will be useful, interactive, and editable, and that the energy use involved will be far less that that of storing low-quality photos of mallards and houseplants. Let’s wait and see, and decide on the merits of what is proposed.
*Updated on 2 July to acknowledge the article had already been cited in the discussion, which I had overlooked/forgotten
I do not oppose users being able to get additional information compiled by the AI. Information about identification or about general biology, whatever, assuming it’s labeled as AI. (I think concerns about computer time for generating AI would be reduced if the program stored answers so that, for example, once it’s compiled data on American Robins it can just retrieve what it compiled last time, maybe updating once a month or so.)
Why do I oppose having AI identify observations and provide additional information? You know we already have trouble with people looking at CV suggestion just and accepting it. Sometimes they even pick the first of a list of half a dozen suggestions though the CV didn’t give any one of them priority. Now, along comes AI not only giving suggestion(s) but explaining why they’re right. If the information were supplied without their explicitly asking for it, do you suppose most people would read all the information? And meaningfully evaluate it? Not often enough! I think AI ID’s with explanations would have even more credibility than the simple CV lists do now and would more often be chosen inaccurately.
Also, as an identifier I would become annoyed as hell if the computer were always throwing up reams of “useful” data about organisms I can identify and am just trying to identify; please leave me alone. I assume than any attempt to increase the AI component of iNaturalist would allow us to opt out of it (or better, to opt in only if we want it), but I can’t stress too strongly how important having options is. I mean, one reason iNaturalist is so successful is that it is (kind of) easy to use and not excessively annoying. Do you think this would be a welcoming site for me if it insisted on explaining why each American Robin observation is an American Robin?
I would especially be annoyed by AI summaries because I know there is a lot of false or out-of-date information out there for AI to skim through. When I need to look up information or pictures to compare or learn from, I have a feel for how credible the sources are – which books I should check, which websites have reliable information, which have good pictures but bad descriptions, which have a taxonomy that iNaturalist now treats as out of date, which I should use cautiously or not at all. I don’t have a idea how credible and AI compilation is because (1) I can’t know where the data comes from and (2) I can’t know what the AI has done to it. (Maybe you don’t think those are problems. Maybe you don’t work with taxonomy and identification of plants.)
Going straight to an AI explanation not only bypasses all that evaluation, it bypasses my ability to go out and figure out how credible information sources are. We all need to learn how to do such evaluation, all the more because of AI throwing a screen of credible seeming veil over everything (not just iNaturalist).
(I recently read a research article about how students actually searching through sources and writing their own reports gained useful skill that those using AI for this did not. Gee, aren’t we shocked.)
Now, I do think that identifiers’ comments on iNaturalist observations are usually accurate! But they’re uneven (many for some taxa, none for most) so I doubt an AI generated descriptions would be limited to them. But maybe.
I admit that my confidence in AI is not enhanced by the unwanted but seductively succinct AI summaries that I now get at the start of each Google search, summaries whose validity I cannot evaluate unless I already know the subject (in which case, why would I google it?) or I check other sources (in which case, why do I need the summary?).
I think we need to ask not only “How can AI be not too harmful/annoying?” but “What do we really need doing that AI can do better than what we have now without being too harmful/annoying?” and identification isn’t one of them, though compiling information (if from reliable sources) might be.
To much rant before breakfast. I should go.
This discussion is useless at this point. Bye.
Apologies if I seemed brusque!.. not intentional or personal… just yes, trying not to spend too much time here haha…and probably not the best at retaining diplomatic tone in forum comments. Especially in the morning. Being neurodiverse also may not help.
The use of a smiley emoji was aimed at the Washington Post reporting, not you…if that was the part that was offensive. As a European I have zero knowledge of the credibility of them as a source more generally - I guess I was just presuming them to be unreliable given general science reportage in newspapers…
Looking at the WP article a bit more, I see there is a source for the original Ren paper which can be found here. It´s a pre-print.
As Masley stated, there is no mention of 140Wh in the paper, and in fact I only see mention of 4Wh in the Ren paper, which is similar to Masley.
The other paper you linked to is cool to see, not one I´d come across.
I just don´t see it backing up the Washington Post claim though, which is still far higher… but even for the 39Wh estimate…that´s on 7000 word inputs! …how many people ever prompt ChatGPT with 7000 word prompts?!
I use it daily, but the most I´ve ever put in would be 500-1000 and thats only three or four times in the last year I think - 99% of my prompts are a sentence long.
To me it seems like that paper offers an extreme use-case to push the models to the max and test the architecture as you state…rather than an example of general usage such as the MIT review, Masley and others use. As such, I guess I didn´t really see my comment as cherry-picking tbh - I just saw 3Wh short prompts as being typical usage. I could argue those using the Washington Post number of 140Wh are more guilty of cherry-picking tbh, given its a long way from the other general use estimates.
Andy Masley´s main point is …that when it comes to climate activism, there are fights worth fighting and there are distractions - genAI individual prompting right now is by and large not a big deal compared to other electricity usage / activities the vast majority of the general public engage in. We should choose our battles. (He´s also vegan, so perhaps the exact use of hamburgers is in part due to that )
Note he´s explicitly not ruling out the bigger issue of AI energy use and data centres more broadly / in the future. And neither am I fwiw!.. I think there is sooo much to be justly concerned about in regard to AI impact on society in the coming years…it honestly scares the hell out of me. But the implementation here… in an unknown manner …of a demo … in the hands of an organisation that already uses AI for good purpose…which will most likely be minimal in additional environmental cost … is simply unworthy of the furore it has created imo.
There are bigger fish to fry.