Everyone Says Vector Databases Are the Wrong Abstraction. Your Notes Never Needed One.
The industry that sold everyone vector databases is now publishing essays calling them the wrong abstraction. For a personal folder of Markdown notes, that was always true. Readable files plus plain search are the simpler, more private retrieval primitive — no index to build, sync, or trust, and nothing leaves your machine to be embedded.
This is a scoped claim, not a eulogy. At production scale — millions of documents, hybrid ranking — vectors still earn their keep, and a vendor that sells them says so 1. The argument here is narrower and more personal: the corpus you actually edit, a few thousand notes you wrote yourself, never needed an embedding pipeline to be searchable. The pros are circling back to a primitive you already have.
What everyone was told to build
For three years the default answer to "let an AI use my documents" was the same: embed everything, store the vectors, query by similarity. Retrieval-augmented generation became synonymous with a vector database. If you wanted to chat with your own notes, the assumed first step was an index.
The pitch was reasonable. Embeddings capture meaning, not just keywords, so a query for "deadlines I missed" could surface a note that said "shipped late." For a search engine over millions of strangers' documents, that fuzzy matching is genuinely useful. The mistake was treating it as the universal shape for retrieval — including for the small, hand-written corpus on your own laptop.
Why the field is walking it back
The walk-back is not a tweet; it is a sustained, dated discourse. In October 2024, Tiger Data's engineering team published "Vector Databases Are the Wrong Abstraction" — the argument that bolting a separate vector store onto your data creates more problems than it solves 2. The essay reached 493 points on Hacker News 3. The conversation never closed.
The core complaint is architectural. "By treating embeddings as independent data, we've created unnecessary complexity for ourselves," the authors write 4. Embeddings are not independent. They are derived from source text, and divorcing them from that source is the original sin. Their words: vector databases treat embeddings "rather than what they truly are: derived data" 5.
That abstraction has a production cost. A typical retrieval team, Tiger Data notes, ends up running a vector store beside a primary database beside a search engine. As they put it: "now you're juggling three systems, and syncing them is a nightmare" 6. Three indexes of the same truth, perpetually drifting out of agreement. For a corporation that is a staffing problem. For a person with a notes folder it is absurd.
The simpler primitive the pros are rediscovering
The alternative is older than the problem: search the text directly. Let the model run plain lexical search — glob and grep — over readable files, and let the model decide what to read next. No pre-built index. The freshness problem disappears, because there is nothing to keep in sync with the files.
The most-cited real-world example is Anthropic's Claude Code. On Hacker News in February 2025, its creator Boris Cherny wrote: "Claude Code doesn't use RAG currently. In our testing we found that agentic search out-performed RAG for the kinds of things people use Code for" 7. Gergely Orosz, summarizing the build for The Pragmatic Engineer, put it plainly: "Plain glob and grep, driven by the model, beat everything" 8.
The reasons are precision, simplicity, freshness, and privacy. Cherny, quoted by Vadim Nicolai, described early Claude Code as RAG with a local vector database that the team dropped: "agentic search generally works better. It is also simpler and doesn't have the same issues around security, privacy, staleness, and reliability" 9. Every one of those four costs is a cost a personal notes folder also wants gone.
This isn't just a code-search story
A fair reader objects: Claude Code searches code, and your notes are prose, so this is apples to oranges. The objection is correct, and worth conceding before answering it. But the principle is not about code; it is about the shape of retrieval, and there is now research that isolates exactly that shape.
In January 2026, a paper introduced GrepRAG, a study of grep-like lexical retrieval for code completion. Its finding: "GrepRAG consistently outperforms state-of-the-art (SOTA) methods, achieving 7.04-15.58 percent relative improvement in code exact match (EM) over the best baseline on CrossCodeEval" 10. More striking is the floor case: "Despite its simplicity, Naive GrepRAG achieves performance comparable to sophisticated graph-based baselines" 11.
Yes, that is also code. The transferable claim is the principle, not the benchmark: for a corpus you actively edit, index-free lexical search driven by a capable model competes with, and sometimes beats, a pre-built fuzzy index. A notes folder is exactly that kind of corpus. You wrote the words; the words are the index.
Where vectors still win (the honest part)
Vector databases are not dead, and saying so is not hedging. The clearest defense comes from a source with every incentive to give it: Tiger Data, which sells a hybrid vector product. Their position: "most real-world AI apps actually need both lexical and vector approaches working together. This is called hybrid search, and it's where the industry is heading" 12.
So the fair case is real. At the scale of millions of documents, across many users, where pure keyword matching misses paraphrase and synonymy, embeddings carry weight that grep cannot. The current "ditch the vector DB" headlines are a reframe of vectors' role, not an obituary. The claim worth keeping is the scoped one: your personal notes never reached that scale, and never will.
What this means for your notes
The practical takeaway is small and durable. If your goal is to find, re-read, and feed your own writing to an AI, you do not need to build an embedding pipeline first. You need your notes in a format a machine can read and a way to search them. That is the whole stack.
It is the same logic that applies to the other index everyone tells you to build — a knowledge graph over your notes. When your wiki-links already are one, the extra layer is overhead you maintain, not capability you gain.
Three concrete moves:
- Keep notes as plain Markdown files you own, on your own device — readable text, not rows in a proprietary store.
- Search the text directly with the lexical tools you already have; let any AI you choose read the files it finds.
- Skip the index until you actually outgrow search — for a personal corpus, that day rarely comes.
The same year the field standardized on plain Markdown as the instruction format for AI agents, it also started walking back the vector database as the default for personal retrieval. The substrate the pros keep returning to is the one you already have: readable files you can grep, read, and own.
Frequently asked questions
Do I need a vector database to search my own notes? No. For a personal corpus of a few thousand notes, plain text plus search is the simpler primitive. Vector databases earn their place at production scale, across millions of documents with hybrid ranking 12, but a folder you wrote yourself is searchable as-is, with no index to build or keep in sync.
Why did Claude Code drop the vector database? Its creator, Boris Cherny, said agentic search "out-performed RAG for the kinds of things people use Code for" 7, and that grep-driven search is "simpler and doesn't have the same issues around security, privacy, staleness, and reliability" 9. The team replaced an early local vector DB with model-driven lexical search.
Is RAG dead? Are vector databases dead? No — this is a reframe, not a death. Even a vendor that sells vector search argues most real apps "need both lexical and vector approaches working together" 12. The shift is away from making a vector store the default primitive for small, personal corpora, not away from embeddings everywhere.
Does "the wrong abstraction" essay mean vectors are bad? It means treating embeddings as independent data is the wrong shape. Tiger Data's argument is that embeddings are "derived data" from source text 5, and divorcing them creates "unnecessary complexity" 4 and three systems to sync 6. The critique is about architecture for personal-scale retrieval, not a claim that embeddings never help.
Is code search the same as notes search? Not identical, since code is structured and prose is not. The shared principle is that index-free lexical search, driven by a capable model, competes with a pre-built fuzzy index for a corpus you actively edit. The GrepRAG study found naive grep-like retrieval "comparable to sophisticated graph-based baselines" on code 11; the shape transfers to a notes folder you maintain.
What do I actually need to chat with my notes without a vector DB? Notes in a machine-readable format and a way to search them. Keep them as plain Markdown on your own device, search the text directly, and let whichever AI you choose, using your own key, read the files it finds. The how-to lives in our companion piece on using Markdown notes as AI memory.
If the field that sold the index is rediscovering the file, the wisest move is to never have left it. Keep your notes as plain text you can read, search, and own — the rest is plumbing that comes and goes. You can do that today in your browser at mnmnote.com.
Footnotes
-
"Why Cursor Is About to Ditch Vector Search and You Should Too," Jacky Liang, Tiger Data, 2025-07-10, https://www.tigerdata.com/blog/why-cursor-is-about-to-ditch-vector-search-and-you-should-too, retrieved 2026-06-09. ↩
-
"Vector Databases Are the Wrong Abstraction," Matvey Arye and Avthar Sewrathan, Tiger Data, 2024-10-29, https://www.tigerdata.com/blog/vector-databases-are-the-wrong-abstraction, retrieved 2026-06-09. ↩
-
"Vector databases are the wrong abstraction," Hacker News discussion (item 41985176), 2024-10-29, https://news.ycombinator.com/item?id=41985176, retrieved 2026-06-09. ↩
-
"Vector Databases Are the Wrong Abstraction," Arye and Sewrathan, Tiger Data, 2024-10-29, https://www.tigerdata.com/blog/vector-databases-are-the-wrong-abstraction, retrieved 2026-06-09. ↩ ↩2
-
"Vector Databases Are the Wrong Abstraction," Arye and Sewrathan, Tiger Data, 2024-10-29, https://www.tigerdata.com/blog/vector-databases-are-the-wrong-abstraction, retrieved 2026-06-09. ↩ ↩2
-
"Vector Databases Are the Wrong Abstraction," Arye and Sewrathan, Tiger Data, 2024-10-29, https://www.tigerdata.com/blog/vector-databases-are-the-wrong-abstraction, retrieved 2026-06-09. ↩ ↩2
-
Boris Cherny, comment under "Claude 3.7 Sonnet and Claude Code," Hacker News (item 43164253), 2025-02-24, https://news.ycombinator.com/item?id=43164253, retrieved 2026-06-09. ↩ ↩2
-
"Building Claude Code with Boris Cherny," Gergely Orosz, The Pragmatic Engineer, 2026-03-04, https://newsletter.pragmaticengineer.com/p/building-claude-code-with-boris-cherny, retrieved 2026-06-09. ↩
-
Boris Cherny, quoted in "Claude Code Doesn't Index Your Codebase," Vadim Nicolai, 2026-03-03, https://vadim.blog/claude-code-no-indexing/, retrieved 2026-06-09. ↩ ↩2
-
"GrepRAG: An Empirical Study and Optimization of Grep-Like Retrieval for Code Completion," Wang et al., arXiv:2601.23254, 2026-01-30 (v1), https://arxiv.org/abs/2601.23254, retrieved 2026-06-09. ↩
-
"GrepRAG: An Empirical Study and Optimization of Grep-Like Retrieval for Code Completion," Wang et al., arXiv:2601.23254, 2026-02-08 (v2), https://arxiv.org/abs/2601.23254, retrieved 2026-06-09. ↩ ↩2
-
"Why Cursor Is About to Ditch Vector Search and You Should Too," Jacky Liang, Tiger Data, 2025-07-10, https://www.tigerdata.com/blog/why-cursor-is-about-to-ditch-vector-search-and-you-should-too, retrieved 2026-06-09. ↩ ↩2 ↩3