# RAG is the wrong abstraction

> Chunk your documents. Embed them. Store vectors. Query by similarity. Pray the right chunk comes back. Then do it all again when anything changes. We tried it. We stopped. Structured files that the AI reads directly replaced a pipeline we spent weeks building. RAG solves "too much data to read" but creates "too much infrastructure to maintain." For most teams, the data fits in files. The maintenance doesn't fit in the schedule.

Date: 2026-04-08
Tags: engineering, industry
Slug: 190-rag-is-the-wrong-abstraction

---

Three months ago, we built a RAG pipeline.

By the book. Chunk the documents, generate embeddings, store them in a vector database, retrieve relevant fragments by similarity. Like the conference slides say. Like the Medium posts recommend. Like the AI startups explain when announcing their \$24M raise.

Two weeks later, we stopped.

## The weight of the pipeline

The problem RAG solves is clear: too much data to fit in the AI’s context window. So you search. Extract relevant fragments and inject them into the prompt.

The problem is that the “so you search” part isn’t one system. It’s five.

Chunking — where to cut. By paragraph? By heading? By token count? By semantic boundary? Different cuts produce different results. The right answer depends on the document type. A meeting transcript and a technical specification don’t share an optimal chunk size. This alone is an engineering problem.

Embedding — converting text to vectors. Which model? OpenAI’s ada? Cohere’s embed? Local sentence-transformers? If the model changes, every vector needs recalculating. Thousands of documents means hours. Tens of thousands means a day.

Storage — you need a vector database. Pinecone? Weaviate? Chroma? pgvector? Each with operational overhead. Backups, scaling, monitoring. One more database is one more failure point.

Retrieval — vectorize the query, fetch the nearest chunks. But “nearest” isn’t “most relevant.” Vector similarity is an approximation of semantic closeness, not an exact match. Search for “deployment procedure” and get back “deployment incidents.” The words are close. The intent isn’t.

Synchronization — a document gets updated? Re-chunk, re-embed, delete old vectors, store new ones. You need an automated pipeline. The pipeline needs monitoring. The monitoring needs alerts.

Five systems. Each with edge cases. Each requiring maintenance. And all of them solve one problem: reading a file.

## Why we stopped

Our documentation at ourstack.dev isn’t massive. A few hundred files. Not tens of thousands.

In two weeks with the RAG pipeline, we noticed three things.

First, search results were unstable. The same question phrased slightly differently returned different chunks. “How do I configure permissions” and “how to manage access control” ask the same thing but returned different documents. A system sensitive to query phrasing isn’t a search engine. It’s a slot machine.

Second, chunk boundaries broke context. The definition sits in the first half of a document, the usage example in the second. The chunk returns the second without the first. Handing the AI text with half the meaning missing and saying “answer based on this” is like photocopying the bottom half of a page and saying “read this.”

Third, the tuning loop never ended. Adjust chunk size. Change the embedding model. Add reranking. Add metadata filters. Fix one problem, create another. The problem wasn’t the parameters. It was the abstraction itself.

## Files can be read as they are

What we did after dropping RAG is almost embarrassingly simple.

We organized the documents.

Structured the information into markdown files. Made filenames searchable. Made the directory structure logical. Then told the AI: “read this directory.”

That’s it.

No embeddings. No vector database. No chunking. No sync pipeline. When a file gets updated, the next read reflects the update. The sync problem doesn’t exist — because the filesystem is the sync.

“But what if there’s too much data?”

Most of the time, there isn’t.

All our project documentation fits in a few megabytes. No need to load everything into context. Just read the relevant files. If the path is known — and the directory structure is logical — there’s no need to search. Navigation is enough.

RAG becomes necessary when data is genuinely massive and you don’t know where to look. Millions of documents. Unstructured data. Mountains of logs. At that scale, search is essential.

But in my experience, most teams working with AI aren’t at that scale. A few dozen to a few hundred documents. A large codebase, sure, but the information the AI needs at any given moment is limited. For that limited information, you run five systems?

## The correction problem

There’s another problem with RAG that rarely gets discussed. The feedback loop.

The AI gives a wrong answer. The user corrects it. “No, permissions are configured like this.”

With structured files, you open the file and fix it. Next time the AI reads it, the information is current. From feedback to correction: one edit.

With a RAG pipeline? First, fix the source document. Then re-chunk it. Re-embed it. Delete the old vectors. Store the new ones. Then pray the corrected chunk surfaces at the top of the next search — because there’s no guarantee the fixed chunk will be the first result.

One correction, five steps. And the last one is probabilistic.

This is RAG’s fundamental problem. Search is probabilistic, so correction is probabilistic too. There’s a gap between “I fixed it” and “the fix is reflected.” That gap widens as the pipeline gets more complex.

## The right abstraction

I’m not saying RAG is wrong. RAG is the right solution for a specific problem: finding information in a truly massive volume of unstructured data.

The problem is that most teams don’t have that problem and deploy RAG anyway.

An accounting firm wants AI to read client files. Does it need RAG? A few hundred files. Put them in folders and let the AI read. A law firm wants to search case law. Tens of thousands of rulings. That’s where RAG belongs.

The distinction is simple. If filesystem navigation is enough, RAG is overkill. If navigation can’t keep up, RAG is necessary.

We were in the first category. Most teams are too.

Infrastructure complexity has inertia. Once built, it’s hard to tear down. “We already built it” becomes a reason to keep it. But the maintenance cost far exceeds the cost of organizing your files.

Four shell scripts and markdown files beat a vector database. The reason: simplicity. Simple systems break less, fix faster, and are easier to understand. And for the AI — easier to read.

The best infrastructure is the infrastructure that doesn’t exist.

— Max

---

[← All posts](../index.md) · [EN](./190-rag-is-the-wrong-abstraction.php) · [FR](../fr/posts/190-rag-is-the-wrong-abstraction.php) · [JA](../ja/posts/190-rag-is-the-wrong-abstraction.php)
