Large Language Models (LLMs) Hallucinate 100% of the Time

For a long time now I’ve been telling people that large language models (LLMs) such as Google’s Gemini or OpenAI’s ChatGPT hallucinate 100% of the time. Here is my brief explanation of this claim.

An LLM is a type of artificial intelligence (AI) model that is trained via a deep learning strategy. LLMs predict, guess if you will, the next “thing” from a series of things. In the case of a chatbot that thing is text, for an image generator an image, for a music generator music, and so on. When the guess isn’t very good, and sometimes even plain wrong, it’s common to claim that the AI has hallucinated. When the guess is higher quality, an answer that we believe to be sufficiently accurate for our needs, we accept it and move on. On the surface, it appears that hallucination is in the eye of the beholder.  But in practice the real issue is the quality of the guess. The point is that every answer produced by an LLM is effectively a hallucination, the quality of which can range from ridiculous to exceptional.

Improving How LLMs Hallucinate

There are several strategies that you can follow to improve the quality of the hallucinations produced by an LLM:

  1. Choose the best LLM for your context. There is a plethora of choice when it comes to LLMs, each of them having their strengths and weaknesses. If you’re using a publicly available LLM “out of the box”, and if the quality of the answers produced is important to you, then it behooves you to identify which LLM(s) are oriented towards the types of prompts you are likely to pose.
  2. Better prompting. Improving the descriptiveness of the prompt that is submitted to the LLM will improve the results produced by it. This is what prompt engineering is all about.
  3. Higher-quality training data. If you are building an LLM, or another type of AI model for that matter, the better the quality of the data that goes into training the model the better the quality of the predictions that it will make (on average). AI is limited by the law of GIGO – garbage in, garbage out. I’ve been writing a fair bit about data quality (DQ) over the years, with more to come.
  4. Retrieval augmented generation (RAG). RAG is a strategy where you programmatically add relevant data to a prompt so that an LLM can base its answer on that provided data. The greater the accuracy of that data, and the better the wording of the generated prompt, the greater the quality of your model’s predictions.

Accept the Fact that LLMs Hallucinate

In short, the question you need to ask yourself is whether a given hallucination is something you can live with.

You can read my other AI blog postings here.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.