Tech & society

Yes, ChatGPT can answer questions. Can you trust it? Nope.

The reason: hallucinations.

Both Google and Microsoft are incorporating information from large language models (LLMs) like ChatGPT into search results. They are also known as conversational AI, however they are far from intelligent.

By unleashing these tools when they are not mature, tech corporations seem willfully blind to documented weaknesses. Additionally, the public sphere exhibits too little attention on corporate behavior in light of the common good.

For example, University of Oxford scholar Stephanie Lin recently researched LLM answers to 817 questions “that some humans would answer falsely due to a false belief or misconception.” She reported the results at the 2022 computational linguistics conference. The best LLM scored 58% (an F); humans, 94% (B+ or A-). Subjects included finance, health and politics.

Notably: “the largest models were generally the least truthful.”

Although it has been in development for several years, ChatGPT created a splash in non-tech news media four months ago. Initial reports touted technological optimism (“the best artificial intelligence chatbot ever released to the general public” capable of “generating impressively detailed human-like written text“) or engaged in handwringing (plagiarism). Missing: the risk of its fabricated responses.

For example, last week The Washington Post detailed how “ChatGPT invented a sexual harassment scandal and named a real law prof as the accused.” The next day, The Guardian’s head of editorial innovation advised that ChatGPT was fabricating Guardian articles as well.

The issues of computer-generated falsehood and its first cousin, harm, aren’t new. In 2021, OpenAI published a frank blog post about the challenge:

[T]hese models can also generate outputs that are untruthful, toxic, or reflect harmful sentiments. This is in part because GPT-3 is trained to predict the next word on a large dataset of Internet text, rather than to safely perform the language task that the user wants.

Read that again. ChatGPT is fundamentally a giant version of predictive text memes that dot digital networks:

The problem: hallucinations

In March, David Shrier, AI & Innovation professor of practice at the Imperial College Business School in London, told Cybernews:

Because ChatGPT and similar large language systems build sentences based on relationships of prior words, the longer the piece you ask them to write, the greater the chance you spiral off into some really odd directions.

Those “really odd directions” have a name: hallucinations, a word “borrowed … from human psychology.”

A hallucination occurs in AI when the AI model generates output that deviates from what would be considered normal or expected based on the training data it has seen, [according to Greg Kostello, CTO and Co-Founder of AI-based healthcare company Huma.AI.]

Yann LeCun is a pioneer in deep learning systems. In March, he told IEEE Spectrum that hallucinations develop from a “fundamental” design flaw:

Large language models have no idea of the underlying reality that language describes… Those systems generate text that sounds fine, grammatically, semantically, but they don’t really have some sort of objective other than just satisfying statistical consistency with the prompt.

Another AI pragmatist, entrepreneur Mathew Lodge, “cautions that the last decade has taught us that large deep-learning models are highly unpredictable.” He told IEEE Spectrum: “LLMs are best used when the errors and hallucinations are not high impact.”

As Craig S. Smith wrote last month: “you can’t trust advice from a machine prone to hallucinations.”

Yet we treat search tools as “advice.”

Search algorithms employ opaque rules to sort the universe of possible answers to our queries. Google searches have historically privileged answers from Wikipedia, a community-authored global encyclopedia. Google and Bing writers create proprietary summaries. What happens to human-curated search results in an era of LLMs?

Who benefits from LLMs?

In a common good framework, “the social policies, social systems, institutions, and environments on which we depend are beneficial to all.” Where is the benefit to society, to search engine users, when the tool is prone to hallucinations?

Microsoft recently laid off its entire “ethics and society” team. If we switch our ethical evaluation to a utilitarian model, where decisions are guided by the greatest good for the greatest number, it’s clear that Microsoft is not acting to benefit the greatest number.

Moreover, both Google and Microsoft are shoehorning LLMs into core products other than search (Gmail and Google Docs; Word and Excel). In mid-March, Microsoft vice president Jared Spataro minimized the risk of this integration in the new product, Copilot:

Sometimes, Copilot will get it right. Other times, it will be usefully wrong, giving you an idea that’s not perfect, but still gives you a head start (emphasis added).

It’s a head start when you have to double-check to rule out hallucinations? I don’t think so. Most users won’t double-check; it’s time-consuming and hard.

Deploying conversational AI models while expecting users to find and flag errors offloads both quality assurance and risk to society. This is not an ethical move, but it could be a profitable one that insures benefits continue to accrue to corporations and those who own and run them.

A bit of history
OpenAI, the organization behind ChatGPT, introduced the first generative pre-training transformer (GPT) in 2018.

About the illustration
Generated from MidJourney, iteratively, using the prompt “abstract conversational AI digital art Neuromancer style.”

References in IEEE style (for students)
[1] S. Pichai, “An important next step on our ai journey,” The Keyword, Feb. 6, 2023. [Online]. Available: [Accessed: Apr. 2, 2023].

[2] Y. Mehdi, “Reinventing search with a new AI-powered Microsoft Bing and EDGE, your copilot for the web,” The Official Microsoft Blog, Feb. 22, 2023. [Online]. Available: [Accessed: Apr. 2, 2023].

[3] M. Ruby, “How ChatGPT works: The model behind the bot,” Towards Data Science, Jan. 30, 2023. [Online]. Available: [Accessed: Apr. 2, 2023].

[4] “What is conversational AI?,” IBM. [Online]. Available: [Accessed: Apr. 2, 2023].

[5] “Stephanie Lin,” Open Review. [Online]. Available: [Accessed: Apr. 2, 2023].

[6] S. Lin, J. Hilton, and O. Evans, “TruthfulQA: Measuring how models mimic human falsehoods,” arXiv [cs.CL], 2021.  Accessed: Apr. 2, 2023, doi: 10.48550/arXiv.2109.07958. [Online]. Available:

[7] S. Lin, J. Hilton, and O. Evans, “TruthfulQA: Measuring how models mimic human falsehoods,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, vol 1, 2022, pp. 3214–3252. Accessed: Apr. 2, 2023, doi: 10.18653/v1/2022.acl-long.229. [Online]. Available:

[8] S. M. Kelly, “This AI chatbot is dominating social media with its frighteningly good essays,” CNN, Dec. 5, 2022. [Online]. Available: [Accessed: Apr. 2, 2023].

[9] K. Roose, “The brilliance and weirdness of ChatGPT,” The New York Times, Dec. 5, 2022. [Online]. Available: [Accessed: Apr. 2, 2023].

[10] S. Lock, “What is AI chatbot phenomenon ChatGPT and could it replace humans?,” The Guardian, Dec. 5, 2022. [Online]. Available: [Accessed: Apr. 2, 2023].

[11] A. Hern, “AI-assisted plagiarism? ChatGPT bot says it has an answer for that,” The Guardian, Dec. 31, 2022. [Online]. Available: [Accessed: Apr. 2, 2023].

[12] P. Verma and W. Oremus, “ChatGPT invented a sexual harassment scandal and named a real law prof as the accused,” The Washington Post, Apr. 5, 2023. [Online]. Available: [Accessed:] Apr. 9, 2023.

[13] C. Moran, “ChatGPT is making up fake Guardian articles. Here’s how we’re responding,” The Guardian, Apr. 6, 2023. [Online]. Available: [Accessed:] Apr. 9, 2023.

[12] V. Petkauskas, “ChatGPT’s answers could be nothing but a hallucination,” Cybernews, Mar. 6, 2023. [Online]. Available: [Accessed: Apr. 2, 2023].

[13] “Aligning language models to follow instructions,” OpenAI, Jan. 27, 2022. [Online]. Available: [Accessed: Apr. 2, 2023].

[14] C. Bryan, “Predictive text memes: The rush of a personality quiz with none of the work,” Mashable, May 8, 2019. [Online]. Available: [Accessed: Apr. 2, 2023].

[15] Wikipedia contributors, “Yann LeCun,” Wikipedia, Mar. 23, 2023. [Online]. Available: [Accessed: Apr. 2, 2023].

[16] C. S. Smith, “Hallucinations could blunt ChatGPT’s success,” IEEE Spectrum, Mar. 13, 2023. [Online]. Available: [Accessed: Apr. 2, 2023].

[17] A. Draper, “A Quick Primer On The Google Algorithm For Lawyers,” Forbes Agency Council, Jul. 20, 2021. [Online]. Available: [Accessed: Apr. 2, 2023].

[18] M. Velasquez, C. Andre, T. Shanks, S.J., and M.J. Meyer, “Thinking ethically,” Markkula Center for Applied Ethics, Santa Clara University, Aug. 1, 2015. [Online]. Available: [Accessed: Apr. 2, 2023].

[19] Z. Schiffer and C. Newton, “Microsoft lays off team that taught employees how to make AI tools responsibly,” The Verge, Mar. 13, 2023. [Online]. Available: [Accessed: Apr. 2, 2023].

[20] Markkula Center for Applied Ethics, “Calculating consequences: The utilitarian approach,” Santa Clara University, Aug. 1, 2014. [Online]. Available: [Accessed: Apr. 2, 2023].

[21] J. Vanian, “Microsoft tries to justify A.I.‘s tendency to give wrong answers by saying they’re ‘usefully wrong’,” CNBC,
Mar. 16, 2023. [Online]. Available: [Accessed: Apr. 2, 2023].

[22] T. Warren, “Microsoft announces Copilot: the AI-powered future of Office documents,” The Verge, Mar. 16, 2023. [Online]. Available: [Accessed: Apr. 2, 2023].

[23] K. Gill, “ChatGPT: hallucinations about weather data,” WiredPen, Apr. 1, 2023. [Online]. Available: [Accessed: Apr. 2, 2023].


By Kathy E. Gill

Digital evangelist, speaker, writer, educator. Transplanted Southerner; teach newbies to ride motorcycles! @kegill

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.