Skip the Vector Database: Markdown Notes as AI Memory

A folder of plain-markdown notes and one hand-written index file is enough to give an AI a usable memory. No vector database. No embedding pipeline. The AI reads your index first, opens the few files it needs, and answers from your own writing. At personal scale this beats the RAG stack everyone tells you to build.

The pattern has a name and an author. Andrej Karpathy published it as the "LLM Wiki" in April 2026, and the gist reached 5,000+ stars (and 5,000+ forks) within weeks ¹. His framing is the whole argument in one line: "Most people's experience with LLMs and documents looks like RAG... This works, but the LLM is rediscovering knowledge from scratch on every question." ²

A wiki of markdown files fixes the part RAG leaves broken. Instead of re-deriving everything per question, the AI reads what it already wrote and builds on it. As Karpathy puts it, "the wiki is a persistent, compounding artifact." ³

This is a how-to, not a manifesto. It explains why plain markdown is the format an AI reads cheapest, walks through the three layers of the pattern with a hand-written index.md you can copy, draws the honest line where the method stops working, and lists the mistakes that break it. You bring your own notes and your own AI key; nothing here asks you to host a model or trust a server.

What is markdown-as-AI-memory?

Markdown-as-AI-memory means pointing an AI at a folder of plain-text notes, written in markdown, with one index file the model reads first. The index tells the AI what exists and where; the model opens only the relevant files and answers from them. No embeddings, no database — just files an AI can read and a map of where they are.

Karpathy's name for the structured version is the LLM Wiki. The idea borrows its mental model from software: "Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase." ⁴ You edit and browse the notes the way you'd browse a repository; the AI maintains and reads them the way a programmer reads code. The notes stay human-readable the entire time, because markdown is both.

The reason this works is that markdown is the cheapest substrate an AI can read. Cloudflare measured it directly: "This blog post you're reading takes 16,180 tokens in HTML and 3,150 tokens when converted to markdown." ⁵ That is "a 80% reduction in token usage" ⁵ for the same content. Strip the markup and the AI sees the meaning, not the scaffolding — which is why "Markdown has quickly become the lingua franca for agents and AI systems as a whole." ⁶

Why your notes are already in that format is the subject of a companion piece, Your Notes Are Already AI-Ready; this post is what to do with it.

Why skip the vector database at all?

Skip the vector database because at personal scale it adds infrastructure you don't need. RAG splits your notes into chunks, turns each chunk into a vector, and stores them in a database you must build, embed, and keep in sync. For a folder of personal notes, the index-and-read approach gets there with none of that machinery.

The cost difference is the whole pitch. In a head-to-head comparison, data-governance writer Emily Winks puts the LLM Wiki's infrastructure cost at "Near-zero - no vector DB, no embedding pipeline" ⁷. There is nothing to provision and nothing to re-embed when a note changes. You edit the file; the AI reads the new version next time.

You also keep something RAG quietly takes away: traceability. With chunk-and-embed retrieval, an answer is stitched from fragments and the model is "rediscovering knowledge from scratch on every question" ². With a wiki, the AI cites the file it read, and you can open that file yourself. The memory is auditable because it is just text. And because "the wiki is just a git repo of markdown files," you "get version history, branching, and collaboration for free." ⁸

The three layers of the pattern

The LLM Wiki is three layers, not three folders. Raw sources are the documents you feed in, kept immutable. The wiki is the markdown notes the AI writes from those sources. The schema is a small config file that tells the AI how to organize the wiki. Inside the wiki live two navigation files: index.md and log.md.

The schema is just a small text file — Karpathy uses a CLAUDE.md or AGENTS.md — naming the rules the AI follows. It is the layer that keeps the wiki consistent over hundreds of edits, because the model re-reads it before each ingest.

The division of labor is the reason the pattern saves effort. You drop a raw source into the pile; the AI distills it into the wiki, updating every note the source touches. This is real bookkeeping, not a one-line summary — Karpathy notes that "A single source might touch 10-15 wiki pages." ⁹ Doing that by hand is the chore the AI removes.

Keeping raw sources read-only buys you a safety net. Because you never let the model edit the originals, you can always re-derive the whole wiki later if you change how you want it organized.

The two navigation files are what make the wiki readable at a glance. The index.md is a content-oriented catalog: a map of what exists and where, the file the model reads first to decide which notes to open. The log.md is an append-only, chronological record of what was ingested and when. The index answers what do I know; the log answers what happened, in order.

Who maintains the wiki?

The AI maintains the wiki; you maintain the inputs. You add a raw source and ask the model to ingest it. It cross-references that source against everything already written, then edits each note the source affects. The point of the pattern is that this upkeep is the machine's job, not yours.

That upkeep is heavier than it sounds, which is exactly why offloading it pays. Karpathy is blunt about the scale of a single ingest: "A single source might touch 10-15 wiki pages." ⁹ No human keeps fifteen interlinked notes consistent by hand after every new clipping. An AI can, on every pass, and it leaves a record in log.md while it does.

This is the compounding part. Because the wiki is edited rather than re-derived from scratch each time, knowledge accumulates instead of resetting — which is what Karpathy means when he calls it "a persistent, compounding artifact." ³ And because the result is plain files, you can read, correct, or roll back any edit the AI made. The model does the labor; you keep the final say.

How to build it from notes you already own

You can build a working version by hand in five steps. Each step is plain markdown — no code, no database, no model hosting. You bring a folder of notes and an AI you already pay for. The whole point is that the structure, not the tooling, does the work.

Keep your raw sources separate. Put the original documents — clippings, PDFs exported to text, pasted articles — in their own folder and treat them as read-only. You will derive everything else from these.
Write the wiki as plain notes. One markdown file per topic, person, or project. Short, linked, human-readable. These are the notes the AI will read and update; they are also notes you can read.
Hand-write index.md first. This is the highest-leverage file. List every wiki note with a one-line description of what it covers, so the model can scan the map before opening anything. A starting template:

# Index

> The map the AI reads first. One line per note: what it covers, where it is.

## People
- [[ada-lovelace]] — correspondence, 1842 notes on the Analytical Engine
- [[charles-babbage]] — engine designs, funding disputes

## Projects
- [[analytical-engine]] — architecture, status: active
- [[difference-engine]] — earlier design, status: resolved

## Sources
- See `/raw` for original documents; see `log.md` for ingest order.

Add a log.md for order. Append one line each time you ingest a source: the date, the source, and which notes it changed. This is the audit trail RAG can't give you.
Point your AI at the folder. Open the folder in an AI coding or chat tool that can read local files, tell it to read index.md first, and ask your question. It opens the few notes the index points to and answers from them.

The first answer is the test. If the model opens the right notes from the index alone, the structure works. If it flails, the index is too thin — add one-line descriptions until the map is legible.

When the pattern stops working

The pattern works up to a clear ceiling, and naming it honestly is what separates a method from a sales pitch. Karpathy scopes it precisely: it "works surprisingly well at moderate scale (~100 sources, ~hundreds of pages) and avoids the need for embedding-based RAG infrastructure." ¹⁰ Past that, the index stops fitting the model's context, and search earns its keep.

A second restatement draws the same line a little wider. For "curated personal or team knowledge bases with a few hundred to a few thousand pages, the LLM Wiki pattern is often more accurate, faster, and much easier to maintain than RAG" ¹¹. So the honest range is roughly a hundred sources and up to a few thousand pages. A personal knowledge base lives comfortably inside that. A company-wide document corpus does not.

This is why the method is a complement to RAG, not its replacement. The underlying constraint is old: "context windows are too small to handle most websites in their entirety," as the llms.txt convention notes ¹². Below the ceiling, an index beats embeddings on simplicity. Above it, you want retrieval — and that is the right time to add a vector database, not before.

Common mistakes

Most failures of this pattern come from a handful of avoidable mistakes, and they share a tell: the AI's answers slowly stop tracing back to your actual notes. The structure is forgiving until one of these breaks it, and then it breaks quietly rather than loudly. Avoid these five and the method holds at the scale it was designed for.

Skipping the index. Without index.md, the AI has no map and either reads everything (blowing the context window) or guesses. The index is the load-bearing file, not optional polish.
Letting the AI rewrite your raw sources. Keep originals read-only. If the model edits the source pile, you lose the ability to re-derive the wiki, and errors become permanent.
Scaling past the ceiling and blaming the method. Beyond ~100 sources / a few thousand pages, add search ¹⁰ ¹¹. The pattern didn't fail; you outgrew it.
Writing wiki notes only a machine can read. The whole advantage is that the notes stay human-readable. Cryptic, machine-only formatting throws away the audit trail.
Forgetting the log. Without log.md, you can't tell what the AI ingested or when, and a wrong answer becomes impossible to trace to its source.

How this works with your own notes

The pattern needs exactly one thing from your tooling: plain markdown files you control. Notes kept as plain text on your own device — readable by you, editable by hand, and openable by any AI you point at them — are the native substrate for this method. There is no special export and no proprietary format to escape.

That is the quiet fit for an own-your-data note app. Your notes stay local files; you bring your own AI key when you want the model to read them. The editor and your files stay yours, with end-to-end encrypted sharing built in. Whether you also want the model itself running on your machine is a separate question, covered in a companion post on running private AI on your own notes.

The neighborly note: Karpathy builds his wiki in Obsidian, and that's the right spirit. This is a pattern, not a product — it runs on any folder of markdown, in any editor that respects plain text.

Frequently Asked Questions

These are the questions people actually type when they want an AI to read their own notes without standing up a RAG stack. The short answers live here; the long answers are the sections above. Each one points back to the primary source it rests on, so you can check the claim before you trust it.

How do I let an AI use my own notes without building a vector database or RAG?

Point the AI at a folder of plain-markdown notes with one hand-written index.md it reads first. The model scans the index, opens only the relevant files, and answers from them. Karpathy's LLM-Wiki pattern shows this "avoids the need for embedding-based RAG infrastructure" ¹⁰ at personal scale.

What is Karpathy's LLM Wiki pattern?

It is a method, published by Andrej Karpathy in April 2026, where an AI maintains a wiki of markdown notes from your raw sources, organized by a small schema file, with an index.md the model reads first. His mental model: "Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase." ⁴

Is the LLM Wiki pattern better than RAG?

At personal scale, yes — it is simpler, token-cheaper, and traceable, with "Near-zero" infrastructure cost ⁷. At large scale it is not: past roughly a hundred sources and a few thousand pages ¹⁰ ¹¹, retrieval wins. The two are complements, not rivals.

Do I need embeddings to chat with my own notes?

No, not at personal scale. An index file the AI reads first is enough to route it to the right notes. Embeddings and a vector database become worthwhile only once your collection grows past the index's sweet spot of ~100 sources / a few thousand pages ¹⁰ ¹¹.

How big can a markdown knowledge base get before the AI can't read it all?

The index-first method works to about 100 sources and a few hundred pages ¹⁰, and stays viable up to a few thousand pages overall ¹¹. Beyond that, the index stops fitting the model's context window cleanly, and you should add search or RAG.

Why is markdown better than HTML for AI memory?

Markdown carries meaning without markup, so the AI reads it for far fewer tokens than HTML or other rich formats. Cloudflare measured the same page at 16,180 tokens in HTML versus 3,150 in markdown — "a 80% reduction" ⁵. That efficiency is why markdown became "the lingua franca for agents" ⁶.

Your notes do not need a database to be remembered. They need to be plain files, mapped by an index, and readable by both you and the machine — which is all a note was ever supposed to be.

mnmnote.com keeps your notes as plain markdown on your own device, ready for any AI you choose to point at them.

Karpathy, A. "LLM Wiki." GitHub gist, 4 April 2026. Star and fork counts shown as "5,000+" on the gist page. https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f — accessed 2026-06-04. ↩
Karpathy, A. "LLM Wiki," "The core idea." GitHub gist, 4 April 2026. "Most people's experience with LLMs and documents looks like RAG... This works, but the LLM is rediscovering knowledge from scratch on every question." https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f — accessed 2026-06-04. ↩ ↩²
Karpathy, A. "LLM Wiki," "The core idea." GitHub gist, 4 April 2026. "the wiki is a persistent, compounding artifact." https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f — accessed 2026-06-04. ↩ ↩²
Karpathy, A. "LLM Wiki," "The core idea." GitHub gist, 4 April 2026. "Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase." https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f — accessed 2026-06-04. ↩ ↩²
Cloudflare Blog. "Markdown for Agents." 12 February 2026. "This blog post you're reading takes 16,180 tokens in HTML and 3,150 tokens when converted to markdown." / "That's a 80% reduction in token usage." https://blog.cloudflare.com/markdown-for-agents/ — accessed 2026-06-04. ↩ ↩² ↩³
Cloudflare Blog. "Markdown for Agents." 12 February 2026. "Markdown has quickly become the lingua franca for agents and AI systems as a whole." https://blog.cloudflare.com/markdown-for-agents/ — accessed 2026-06-04. ↩ ↩²
Winks, E. "LLM Wiki vs RAG." Atlan. 7 April 2026. Infrastructure cost for LLM Wiki: "Near-zero - no vector DB, no embedding pipeline." https://atlan.com/know/llm-wiki-vs-rag-knowledge-base/ — accessed 2026-06-04. ↩ ↩²
Karpathy, A. "LLM Wiki," "Tips and tricks." GitHub gist, 4 April 2026. "The wiki is just a git repo of markdown files. You get version history, branching, and collaboration for free." https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f — accessed 2026-06-04. ↩
Karpathy, A. "LLM Wiki," "Indexing and logging." GitHub gist, 4 April 2026. "A single source might touch 10-15 wiki pages." https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f — accessed 2026-06-04. ↩ ↩²
Karpathy, A. "LLM Wiki," "Indexing and logging." GitHub gist, 4 April 2026. "This works surprisingly well at moderate scale (~100 sources, ~hundreds of pages) and avoids the need for embedding-based RAG infrastructure." https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f — accessed 2026-06-04. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
MindStudio Blog. "Karpathy's LLM Wiki Pattern: A Knowledge Base Without RAG." 13 April 2026. "For curated personal or team knowledge bases with a few hundred to a few thousand pages, the LLM Wiki pattern is often more accurate, faster, and much easier to maintain than RAG." https://www.mindstudio.ai/blog/karpathy-llm-wiki-pattern-knowledge-base-without-rag — accessed 2026-06-04. ↩ ↩² ↩³ ↩⁴ ↩⁵
Howard, J. "The /llms.txt file." llmstxt.org. 3 September 2024. "Large language models increasingly rely on website information, but face a critical limitation: context windows are too small to handle most websites in their entirety." https://llmstxt.org/ — accessed 2026-06-04. ↩