RAG Is Not Retrieval: What Researchers Get Wrong About AI and Knowledge

The Name Is Misleading

When people hear "Retrieval-Augmented Generation," many assume the AI is doing something like an internet search: finding and returning existing information. Having spent considerable time building a production RAG system for Canadian immigration research (IMRAG, running at imrag.ca), I want to offer a researcher's account of what is actually happening — and why it matters for anyone using these systems in academic or policy contexts.

What Retrieval Actually Does

In a RAG system, "retrieval" means finding chunks of text from a document collection that are statistically similar to the user's query. The similarity is computed in a high-dimensional vector space — each piece of text is encoded as a numerical vector by an embedding model, and retrieval finds the chunks whose vectors are closest to the query vector. This is not the same as finding the document that contains the correct answer.

In IMRAG, we use hybrid retrieval: combining vector similarity with BM25 keyword matching, then fusing results using Reciprocal Rank Fusion (RRF). We also run a cross-encoder reranker over the top candidates to better assess genuine relevance. This pipeline, informed by a review of 59 recent RAG publications, significantly outperforms naive vector-only retrieval — but even so, the system does not "know" the answer in any meaningful sense. It finds text that is relevant. What happens next is generation.

What Generation Actually Does

The large language model takes retrieved chunks and the user's query and generates a response. When it generates a response, it is producing text that is statistically coherent given the input — it is not "reading" the documents the way a human would, and it is not reasoning from first principles. Two consequences researchers must keep in mind:

The model can hallucinate even with retrieval. If retrieved chunks are ambiguous or peripherally related to the query, the model will still generate a confident-sounding response.
The model's prior training influences its responses. Even when instructed to answer only from provided documents, the generation is shaped by pre-training — the model may add framing or context not in the source material.

Corrective RAG: Acknowledging Uncertainty

One of the most important design choices in production RAG is Corrective RAG (CRAG): building the system to evaluate its own retrieval confidence and communicate uncertainty honestly. In IMRAG, if retrieved chunks are assessed as low-relevance, the system says so, rather than generating a confident but poorly-grounded response. For a system handling immigration queries — where stakes can involve visa applications and legal status — false confidence is genuinely harmful.

What This Means for Research Integrity