The Name Is Misleading

When people hear "Retrieval-Augmented Generation," many assume the AI is doing something like an internet search: finding and returning existing information. Having spent considerable time building a production RAG system for Canadian immigration research (IMRAG, running at imrag.ca), I want to offer a researcher's account of what is actually happening — and why it matters for anyone using these systems in academic or policy contexts.

What Retrieval Actually Does

In a RAG system, "retrieval" means finding chunks of text from a document collection that are statistically similar to the user's query. The similarity is computed in a high-dimensional vector space — each piece of text is encoded as a numerical vector by an embedding model, and retrieval finds the chunks whose vectors are closest to the query vector. This is not the same as finding the document that contains the correct answer.

In IMRAG, we use hybrid retrieval: combining vector similarity with BM25 keyword matching, then fusing results using Reciprocal Rank Fusion (RRF). We also run a cross-encoder reranker over the top candidates to better assess genuine relevance. This pipeline, informed by a review of 59 recent RAG publications, significantly outperforms naive vector-only retrieval — but even so, the system does not "know" the answer in any meaningful sense. It finds text that is relevant. What happens next is generation.

What Generation Actually Does

The large language model takes retrieved chunks and the user's query and generates a response. When it generates a response, it is producing text that is statistically coherent given the input — it is not "reading" the documents the way a human would, and it is not reasoning from first principles. Two consequences researchers must keep in mind:

  1. The model can hallucinate even with retrieval. If retrieved chunks are ambiguous or peripherally related to the query, the model will still generate a confident-sounding response.
  2. The model's prior training influences its responses. Even when instructed to answer only from provided documents, the generation is shaped by pre-training — the model may add framing or context not in the source material.

Corrective RAG: Acknowledging Uncertainty

One of the most important design choices in production RAG is Corrective RAG (CRAG): building the system to evaluate its own retrieval confidence and communicate uncertainty honestly. In IMRAG, if retrieved chunks are assessed as low-relevance, the system says so, rather than generating a confident but poorly-grounded response. For a system handling immigration queries — where stakes can involve visa applications and legal status — false confidence is genuinely harmful.

What This Means for Research Integrity

  • Cite the sources, not the AI. RAG systems should expose their retrieved sources. Researchers should cite those, not the AI's synthesis.
  • Treat AI outputs as drafts. Even the best RAG output should be a starting point for human evaluation, not a finished answer.
  • Understand the corpus. RAG is only as good as the documents it retrieves from. Authoritative curated collections will behave very differently from web scrapes.
Share: LinkedIn Twitter

Comments (0)

No comments yet. Be the first to comment.

You must be logged in to comment.

Login to Comment