Research Guides: LLMs and GenAI in Digital Scholarship: Limitations of GenAI Responses

Bias in AI Outputs

As noted earlier, LLMs learn from the internet and other human text. This means they inevitably pick up human biases present in the data. If much of the training text portrays a stereotype or uses biased language about a group, the model can reproduce or even amplify those biases. For example, researchers have found language models exhibiting gender bias in career associations (e.g., linking men with certain professions more than women), or racial bias in sentiment/analyzing text about different groups. There are also cultural biases. An LLM may perform better on Western texts and misunderstand or misrepresent content related to less represented cultures or languages.

Biases may not be so overt or apparent. For instance, if the training data has fewer scholarly articles by female scientists, an LLM might (statistically) be less likely to mention or quote them, thus marginalizing certain voices. Bias can also appear in how the AI treats user queries, giving more detailed answers for some topics and shallow ones for others, based on data biases.

AI companies state that they do try to mitigate the worst biases in high-profile models via fine-tuning and human feedback (often the work of those human annotators we discussed). They explicitly instruct models to avoid hate speech or obviously discriminatory content. However, fine-tuning to reduce bias in one area can introduce bias in another, or reduce the model’s helpfulness.

For a researcher or student using LLMs, the key is to stay vigilant for bias. If an AI-generated summary of a historical event seems to downplay certain perspectives, ask whether that might be due to the sources it was trained on. If an AI writing assistant consistently uses masculine pronouns or male examples in generic text, recognize that and adjust or instruct the AI to diversify its output. Critical literacy means not taking the AI’s output as neutral or universal truth. Always consider who might be misrepresented or left out in the AI’s answer.

Hallucinations (Fabricated Information)

One of the most notorious issues with LLMs is their tendency to hallucinate. In AI terminology, this means the model produces content that sounds plausible and authoritative, but is completely made-up or false. The AI doesn’t know it is incorrect. It’s simply following the prompt and word patterns to a conclusion supported by predictive statistics that may not align with reality.

Examples of AI hallucinations:

The model might invent a citation or reference that looks real but isn’t. (E.g., listing a DOI or paper title that doesn’t exist, because it pieced together author names and jargon from training data.)
It might confidently state inaccurate “facts”. For instance, asking “Who won the Nobel Prize in Physics in 2023?” might yield a very specific answer with a name – which turns out to be fictional if the model’s training cutoff was before 2023 and it’s guessing based on patterns.
The model could create a biographical detail about a real person that’s incorrect (say, giving a wrong birthdate or a nonexistent award).
In coding, it might hallucinate a function name or API that doesn’t exist but looks real, leading a programmer user astray.

Why do hallucinations happen? Recall that the LLM generates text by pattern completion, not by consulting a knowledge graph or database. If your question pushes it beyond the boundary of what it was trained on, it will still produce an answer rather than admit ignorance by trying to fill the gap with something that fits statistically. As AI safety researchers dryly note, “generative models are masters of BS” they will fill in any prompt with fluent nonsense if needed. Moreover, the training process doesn’t inherently teach the model what is true versus false; it only learns what was likely to be said in response to something. Many factual-looking statements appear in its training texts but the model has no built-in mechanism to verify facts or check consistency.

Hallucination is a major source of AI-generated misinformation.

To handle hallucinations, follow these practices:

Always verify important information from a reliable source. If an AI gives you a statistic, check the source or look it up separately. If it cites an article, try to find that article (you’ll often discover it was made up).
Prefer using tools that augment the LLM with real data. A good example is the Retrieval-Augmented Generation (RAG) approach (covered later) where the AI is connected to a database or the web to fetch actual documents. This can greatly reduce hallucination because the model grounds its answer in provided reference text.
Ask the AI to show its reasoning or sources. While the AI’s own explanation isn’t guaranteed correct, a detailed reasoning can sometimes reveal gaps (e.g., it tries a chain-of-thought and you can spot where it goes off track). Some advanced prompting (like “Let’s think step by step”) can induce the model to break down the task, which might reduce errors in complex problems.
Be cautious with “creative” or open-ended prompts for factual queries. If you ask for a fictional story or imagine a scenario, hallucination isn’t an issue (it’s the point!). But if you need factual precision, you might constrain the prompt or explicitly say “if you are not sure, say you don’t know.” The base ChatGPT is trained to sometimes respond with uncertainty, but many models will otherwise forge ahead.

As an illustration of hallucination: A legal research incident involved an attorney using ChatGPT to find court cases; ChatGPT provided several case names and docket numbers that did not exist, yet the AI insisted they were real. This could have been caught by simply checking a legal database. It underscores that no matter how confident or detailed an answer looks, treat unverified AI output as potentially fictional until confirmed.In summary, hallucinations are a top reason you cannot blindly trust an LLM.

Critical literacy demands verifying AI outputs, using mechanisms to ground the AI in reality when possible, and understanding that fluent ≠ factual. Think of an LLM as having a very creative but occasionally unreliable student: you need to fact-check their work before accepting it.