The most important thing to know is that these technologies reflect how they are built. Machine learning fundamentally uses the past to make decisions or generate content without good ability to use contextto determine what was bad in the past that we might not want to keep for the future.
Discrimination and Bias¶
In general, technological systems reflect the values of the people who build them. This may be explicit or implicit. With machine learning, the way the values of the builders are expressed is more distant from what goes out into the world; it is often less carefully checked.
Evironmental Impact¶
Training large language models requires an immense amount of energy and water (for cooling), using them requires less per use, but still a lot, especially when aggregated across all of the iterative prompting across all of the users. HuggingFace :hugging_face: , a company that provides a platform for sharing trained models, datasets, and applications provides a policy primer that is a good starting point Luccioni et al. (2024). Hugging Face researcher Sasha Luccioni also participated in an NPR Podcast on Climate impact of AI.
Some key highlights:
Strubell et al. (2020) found that training an LLM with only 213 million parameters [1] was responsible for 625,155 points of roughly equivalent to the life time emissions of five cars.
Luccioni et al. (2023) followed a sample model, of similar size to GPT3 (176 Billion parameters vs 175) through its full lifespan from training to use. They found that the total carbon impact of such a model is more than training alone.
Luccioni et al. (2024) break down model energy use by different types of tasks (text classificaiton, summarization, text generation, image generation) and found that image generation uses nearly 60 times more energy than any text based task and has a carbon impact comparable to driving the average passenger vehicle about a mile[2], using about as much energy as charging a cellphone half way [3]
in addition to the electriciy use, large data centers use tremendous amounts of fresh water and those numbers have increased rapidly recently ( google increased by 20% and Microsoft by 34% from 2021 to 2022) Li et al. (2023)
Depending on the location of data center that handles the query, GPT3 could consume[4] 7.2ml (Netherlands) to 48.3ml (Washington, US) of water per query meaning a full 500ml (16.9oz) bottle of water is consumed for every 11 (Washington) to 70 queries Li et al. (2023)
Privacy¶
Models can reveal the data they were trained on, which means that any user could see any value that the model builder put in to learn from. For generative models this can happen in normal use, and for predictive models, certain types of explanations, which are required by law in some contexts can be used to figure out the training data.
Researchers demonstrated an efficient attack on the production chatGPT Nasr et al. (2023). Combined with the fact that free tools almost always save the data you provide them to train and improve their products, that means that anything you send to a chatbot, it could spit back out to another user.
IP risks. Content generated by an AI cannot be copyrighted, it is not ownable.
Impacts on Human Thinking¶
A team at Univeristy of Toronto released a pre-print demonstrating a decrease in human creativity when asked to perform a task independently after exposure trials using an LLM Kumar et al. (2024). This study measured two types of creativity using standard measures for each.
A team at Northwestern’s Kellog School of Business released a pre-print showing that LLM creativity is similar to human creativity, but when prompted to respond as female, old, or Black, they score considerably worse on creativity Wang et al. (n.d.). This study measured one type of creativity.
A large-scale meta analysis found that AI+ human performance is often worse than the best of human alone or AI alone. They also found that AI improved a person’s performance mostly for generation tasks, but not for decision making tasks Vaccaro et al. (2024).
Errors[5] are inevitable¶
LLMs will make mistakes and give false information at some rate, guaranteed. These errors are because the LLM has no grounding in facts, it is generating text based on a probabilisitic model of language. With a lot of feedback loops[6] or careful prompting good answers can be produced at high rates in some domains, but fundamentally they
A team at OpenAI and Georgia Tech showed that the training procedure means that LLMs are going to, with high probability, give a wrong answer over saying “I don’t know”Kalai et al. (2025).
GPT3 (initial ChatGPT was GPT3.5) was 175 billion parameters Brown et al. (2020), nearly 1000x times as many as the model evaluated by Strubell et al. (2020)
Luccioni et al. (2024) found a range aroud of carbon impact in he 150-500g range across 1000 tests; the US EPA says that the average passenger vehicle emits 400g per mile
The US EPA changed its estimate of cell phone energy use to 0.22kWh in January 2024 from 0.012KWh prior to that, an initial version of the Luccioni et al. (2024) study said equal to a full cell phone charge, but the publised version says half, accordingly.
Li et al. (2023) differentiates between water withdrawal, which includes all water taken temporarily or permanenly and water consumption, calculated as withdrawal minus discharge. They then focus on consumption because it reflects the impact of water us on downstream water availability.
LLM errors are sometimes called hallucinations. I prefer to refer to them as errors. Hallucinations in a human brain occur when the brain stops using sensory information and treats its predictions about the world as if they are true and then without sensory information, the predictions become less and less related to the real environment the person is in. LLM hallucinations are defined by its output being incorrect.
Reasoning models work, in broad terms, by adding extra text to the prompt and taking the output and then re-prompting. For example if your prompt is “what is 3+5” the reasoning model might actually prompt the LLM with something like, “make a plan to respond to: what is 3+5” and then it takes the models plan for answering the question and prompts the model again to actually get an answer. You can think of it like prompting the model with specific prompts that help first write a better prompt for the thing you actually want and then actually using the better prompt.
- Luccioni, S., Trevelin, B., & Mitchell, M. (2024). The Environmental Impacts of AI – Policy Primer. Hugging Face Blog. 10.57967/hf/3004
- Strubell, E., Ganesh, A., & McCallum, A. (2020). Energy and Policy Considerations for Modern Deep Learning Research. Proceedings of the AAAI Conference on Artificial Intelligence, 34(09), 13693–13696. 10.1609/aaai.v34i09.7123
- Luccioni, A. S., Viguier, S., & Ligozat, A.-L. (2023). Estimating the Carbon Footprint of Bloom, a 176b Parameter Language Model. Journal of Machine Learning Research, 24(253), 1–15.
- Luccioni, S., Jernite, Y., & Strubell, E. (2024). Power Hungry Processing: Watts Driving the Cost of AI Deployment? The 2024 ACM Conference on Fairness, Accountability, and Transparency, 85–99.
- Li, P., Yang, J., Islam, M. A., & Ren, S. (2023). Making AI Less “Thirsty”: Uncovering and Addressing the Secret Water Footprint of AI Models. arXiv Preprint arXiv:2304.03271.
- Nasr, M., Carlini, N., Hayase, J., Jagielski, M., Cooper, A. F., Ippolito, D., Choquette-Choo, C. A., Wallace, E., Tramèr, F., & Lee, K. (2023). Extracting Training Data from ChatGPT [Techreport].
- Kumar, H., Vincentius, J., Jordan, E., & Anderson, A. (2024). Human Creativity in the Age of Llms: Randomized Experiments on Divergent and Convergent Thinking. arXiv Preprint arXiv:2410.03703.
- Wang, D., Huang, D., Shen, H., & Uzzi, B. (n.d.). A Preliminary, Large-Scale Evaluation of the Collaborative Potential of Human and Machine Creativity. 10.31234/osf.io/xeh64
- Vaccaro, M., Almaatouq, A., & Malone, T. (2024). When Combinations of Humans and AI Are Useful: A Systematic Review and Meta-Analysis. Nature Human Behaviour, 1–11.
- Kalai, A. T., Nachum, O., Vempala, S. S., & Zhang, E. (2025). Why Language Models Hallucinate. arXiv. 10.48550/ARXIV.2509.04664
- Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., … Amodei, D. (2020). Language Models Are Few-Shot Learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, & H. Lin (Eds.), Advances in Neural Information Processing Systems (Vol. 33, pp. 1877–1901). Curran Associates, Inc.