Research Guides: LLMs and GenAI in Digital Scholarship: Retrieval Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG)

One of the most important techniques for using LLMs in research is Retrieval-Augmented Generation (RAG). This approach addresses the LLM’s knowledge cutoff and hallucination issues by providing it with relevant external information from a database or corpus when answering a question.. In simpler terms, instead of relying on material from training (which might be outdated or incomplete), RAG fetches up-to-date or specific information and feeds it into the prompt, so the model’s answer can be grounded in that information.

How RAG works:

You have a knowledge source – e.g. a collection of articles, papers, books, or a vector database of embeddings of those documents.
When a query comes in (like “Explain the significance of protein X in disease Y”), the system first uses a retriever to find relevant pieces of text from the knowledge source (for example, paragraphs from journal articles about protein X and disease Y).
These retrieved snippets are then appended to the prompt given to the LLM, often along with an instruction like “use the information below to answer.”
The LLM generates an answer that hopefully integrates the provided context and possibly even cites it if asked.

By doing this, the model is effectively able to answer questions about information it was never trained on, because we supply that info at query time. It also reduces hallucination since the model can mimic or quote the provided text rather than guessing. However, the model might still make inferences beyond the text, so careful prompting (like “answer only based on the text above”) can help.

Example of RAG usage: Imagine you have a database of research abstracts about COVID-19. A user asks, “What do studies say about the efficacy of remdesivir for COVID-19?” A RAG system would vector-search the abstracts for “remdesivir efficacy COVID” and retrieve perhaps 2-3 relevant passages. Suppose it retrieves an abstract snippet that says “…a trial of remdesivir showed reduced time to recovery in hospitalized patients (Smith et al. 2020)… however, no significant effect on mortality was observed (Doe et al. 2021).” and another snippet from a review “…remdesivir has modest clinical benefit, especially if given early, but is not a magic bullet…【Johnson 2022】.”. These are fed into the prompt. The prompt might be: “Using the information provided, summarize the efficacy of remdesivir for COVID-19 treatment.” The LLM then might output: “Studies indicate that remdesivir can have a modest benefit in treating COVID-19, primarily in reducing recovery time for hospitalized patients【Smith et al. 2020】. However, evidence suggests it does not significantly lower mortality rates【Doe et al. 2021】. Overall, its efficacy is limited and it is not a cure-all, though early administration may provide some benefit【Johnson 2022】.”

This answer is grounded in actual retrieved data, with citations. Without RAG, a standalone LLM might hallucinate or base its answer on possibly outdated training info.

For an academic library, RAG-powered LLM assistants are very attractive: you could enable natural language queries over your catalog or institutional repository, where the LLM gives answers with references to actual library materials. This provides a more trustworthy AI assistant compared to a vanilla chatbot that might spout guesses.

Setting up RAG: Tools like LangChain, LM Studio and Anything LLM, or the Haystack framework can facilitate RAG. They handle document ingestion (breaking into chunks and embedding them in a vector store), retrieval (similarity search for relevant chunks), and then pass it to an LLM. The good news is you can do RAG with both open-source models and proprietary models via API – it’s model-agnostic. It’s an architecture choice more than a specific product.

Limitations of RAG: If your documents are insufficient or the retriever fails to grab the right info, the answer will suffer. Also, the prompt context length limits how much info you can feed – if too much is retrieved, you might have to summarize or choose top-k passages. Despite those, RAG is currently one of the best ways to get accurate, source-supported answers from LLMs and is widely used in question answering systems, customer support bots, and research assistants.

On the next page are RAG tutorials to get you started.