It’s hard to estimate, yes, but I have not seen enough evidence to believe that it’s less than people think. In the thread above there is one Nature article looking at Meta Llama (Llama-3-70B/Gemma-2B-it) models. The comparison to humans is questionable to me, but that doesn’t mean it isn’t as bad as we thought. There are some questionable decisions on their methodology imo, e.g. by assuming a human will write 300words/h but the LLM is will only need a single prompt to arrive at “comparable” results. Note this comment:
We acknowledge that LLM performance may vary across different content types and that our analysis does not account for qualitative differences in output between LLMs and humans. Thus, our study aims to provide a quantitative comparison of resource utilization rather than a qualitative assessment of content.
The human resource consumption is rather unfair, e.g. water consumption is based on the daily water consumption of a human prorated for the 1,67 h it takes them to write the benchmark task + the amount of water needed to generate the electricity at that time. This is regardless of wether the water consumption can be attributed to such a task.
The results there are for specific types of models. Heed these warnings:
However, the growing model sizes driven in part by the scaling law (e.g, recently released Llama-3.1-405B16) will likely increase the energy consumption and the associated environmental impacts of LLMs substantially.
Despite the potential efficiency advantages of LLMs compared to human labor, we emphasize that our analysis is not intended to derail the ongoing efforts to curb LLMs’ own large environmental footprints.
What is more, by comparing the efficiency to human labour, the only sense of scale is that LLMs are “efficient” (allegedly) is if you want to substitute humans. Yet we will continue to exist, even if the user prompter isn’t counted on the energy consumption of the LLM:
presenting a comparative assessment of the environmental impact of LLMs vs. human labor, examining their relative efficiency across energy consumption, carbon emissions, water usage, and cost.
In this article they go over the cost of training and inference of GPT-3 (Preprint), and they also mention this:
As acknowledged in Google’s sustainability report9 and the recent U.S. datacenter energy report,25 the expansion of AI products and services is a key driver of the rapid increase in datacenter water consumption
The Washington Post reported on GPT-4(2024-09-18, so not the latest models) (Archive.org version) using he same methodology as the paper above. That’s the famous 0.5 liters and requires 0.14 kWh of electricity (about the same as using 14 LED light bulbs for 1 hour) per 100-word query.