Building a Local Digital Brain for Claude

Building a Local Digital Brain for Claude

A fully local, persistent knowledge layer over a folder of PDFs — embeddings, a knowledge graph, and an MCP server Claude can call into. No API keys. Nothing leaves your laptop.

Claude is excellent at general reasoning, but out of the box it doesn't know your team's documentation, your design notes, or that 400-page reference PDF you've been meaning to read. I wanted a way to point Claude at a folder of my own materials and have it actually use them — to ground answers, write code in our conventions, and cite specific pages — without sending anything to the cloud.

This post walks through what I built: a fully local, persistent "digital brain" over a folder of PDFs, exposed to Claude Desktop, Claude Code, and Cowork via a single MCP server. Embeddings come from Ollama, vectors live in ChromaDB, and a knowledge graph of entities and relationships lives in NetworkX — all on disk, all on your laptop.

What you'll have at the end

PDFs ─→ pymupdf ─→ chunks ─┬─→ Ollama embed ─→ ChromaDB ─┐
                           │                              ├─→ MCP tools ─→ Claude
                           └─→ Ollama extract ─→ NetworkX ┘

Two retrieval surfaces working together: vector search answers "what does the corpus say about X?" and the graph answers "what is X connected to and how?". Claude calls them through six MCP tools — search_brain, get_related, list_entities, list_sources, expand_context, ingest_path — and uses what it finds to ground answers and code generation.

Here's a real snapshot of the graph after ingesting one technical PDF — the most-connected entities and the relationships between them, extracted automatically during ingestion. Click a node to see its edges.

Knowledge graph snapshot

Showing 80 of 5,513 nodes · 302 of 4,540 edges
Top-80 entities by graph degree. Node size reflects connectedness; color reflects entity type. Edges are directed and labeled with the predicate the model extracted (e.g. uses, part_of, is_a). Hover an edge to see the source PDF and page.

Stack and why

ComponentChoiceWhy
EmbeddingsOllama nomic-embed-textLocal, free, decent quality (768-dim)
Entity extractionOllama qwen2.5:7bLocal, JSON-mode capable, runs on Apple Silicon
Vector storeChromaDB (persistent)File-backed, zero-config, fast cosine search
Graph storeNetworkX MultiDiGraph + pickleNo DB to run, easy to inspect from Python
Claude integrationMCP server (FastMCP, stdio)Standard protocol, one server works for all clients

Layout

digital-brain/
├── sources/            ← drop your PDFs here
├── data/               ← Chroma vectors + NetworkX graph (auto-created)
├── scripts/
│   ├── brain.py        shared config + helpers
│   ├── ingest.py       PDF → chunks → embeddings → graph
│   └── query.py        CLI + library for searching the brain
├── mcp_server/
│   └── server.py       MCP server Claude connects to
├── requirements.txt
├── setup.sh
└── mcp-config.example.json

Setup

1. Prerequisites

You need Python 3.10+ and Ollama on your Mac.

brew install python@3.12

For Ollama, don't use brew install ollama — recent Homebrew bottles have been missing the bundled inference binary. Instead, download the official app from ollama.com/download, drag it to /Applications, and launch it once. The menu-bar app keeps the background service running.

Verify Ollama is reachable:

curl http://localhost:11434/api/tags

2. Project scaffold

mkdir -p ~/claude-projects/digital-brain/{sources,data,scripts,mcp_server}
cd ~/claude-projects/digital-brain

Create requirements.txt:

pymupdf>=1.24.0
chromadb>=0.5.0
ollama>=0.3.0
networkx>=3.2
mcp>=1.0.0
tqdm>=4.66.0
pydantic>=2.0

Set up the venv and install:

python3.12 -m venv .venv
.venv/bin/pip install --upgrade pip
.venv/bin/pip install -r requirements.txt

Pull the two models (~5 GB total, one-time):

ollama pull nomic-embed-text
ollama pull qwen2.5:7b

3. The code

The full source lives in three files. The shape of each:

scripts/brain.py — shared config and helpers. Defines paths, chunking parameters, model names, and small wrapper functions for embed_texts, chat_json, get_collection, load_graph / save_graph. Centralizing this keeps the rest of the code clean.

scripts/ingest.py — the ingestion pipeline. Per PDF:

  1. PyMuPDF extracts text page by page.
  2. Text is chunked (~900 chars with 120-char overlap, prefer paragraph/sentence boundaries).
  3. Each chunk is embedded via Ollama and upserted into ChromaDB.
  4. Each chunk goes through an LLM extraction pass that returns {entities: [...], relations: [...]} in strict JSON.
  5. Entities and (subject, predicate, object) triples merge into a MultiDiGraph with mention metadata for citations.

A simple fingerprint log (size:mtime) makes re-runs idempotent. A periodic graph checkpoint every 100 chunks means a crash mid-extraction loses at most a hundred chunks of work.

scripts/query.py — the retrieval library and CLI. Exports search, related, list_sources, list_entities, and expand_context (a one-shot that pulls top-k chunks plus graph neighbors of the most relevant entities). Same module powers both the CLI and the MCP server.

mcp_server/server.py — a tiny FastMCP wrapper that exposes the query functions as Claude-callable tools. Each tool's docstring becomes its description in Claude's tool list, so be deliberate about wording — that's how Claude decides when to call which tool.

4. Add knowledge

Drop PDFs into sources/ (subfolders are fine — the ingester walks recursively):

cp ~/Downloads/papers/*.pdf sources/

5. Ingest

caffeinate -i .venv/bin/python scripts/ingest.py

The caffeinate -i keeps your Mac awake during the run. You'll see two phases: a fast embedding pass, then a slower entity-extraction pass via the local model. Embedding takes a few minutes for a thousand chunks; extraction runs at ~1–3 chunks/sec on Apple Silicon, so a few hundred PDF pages takes 30–90 minutes. If you just want vector search to start, pass --no-graph and backfill the graph later.

When it finishes you'll see:

============================================================
Indexed chunks this run : 979
Graph nodes             : 5513
Graph edges             : 4540

6. Verify from the CLI

.venv/bin/python scripts/query.py --sources
.venv/bin/python scripts/query.py --entities ""
.venv/bin/python scripts/query.py "your question here"

The first lists ingested PDFs. The second prints the top entities by graph degree (your highest-connected concepts). The third runs semantic search and returns ranked passages with page numbers and similarity scores.

7. Wire it into Claude

The MCP server is the same Python script for all clients — only the registration step differs.

Claude Code (CLI):

claude mcp add --scope user digital-brain \
  /Users/YOU/claude-projects/digital-brain/.venv/bin/python \
  /Users/YOU/claude-projects/digital-brain/mcp_server/server.py

claude mcp list   # should show digital-brain ✓ Connected

The --scope user flag makes it available from any directory.

Claude Desktop: edit ~/Library/Application Support/Claude/claude_desktop_config.json and add an mcpServers block:

{
  "mcpServers": {
    "digital-brain": {
      "command": "/Users/YOU/claude-projects/digital-brain/.venv/bin/python",
      "args": ["/Users/YOU/claude-projects/digital-brain/mcp_server/server.py"],
      "env": { "OLLAMA_HOST": "http://localhost:11434" }
    }
  }
}

Cmd-Q to fully quit Claude Desktop and reopen — the tools icon in the chat input will now list digital-brain with six tools.

Cowork: Settings → MCP Servers → Add server, with the same fields.

Using it

Once registered, Claude calls the tools on its own when relevant. To make sure it reaches for the brain on domain questions, drop a CLAUDE.md in your project or paste a one-liner into your prompt:

When the user asks a domain question, call digital-brain.expand_context first and cite filenames and page numbers. When generating code that depends on conventions in our docs, call search_brain first.

A few prompts that exercise different parts of the brain:

"Find what my notes say about governor limits in Apex. Cite page numbers." — vector search
"What does the brain know about Visualforce? Call get_related and summarize." — graph traversal
"Using my digital brain as source of truth, write an Apex trigger that runs after insert on Account, queries related Contacts, and updates a custom field. Cite pages." — both at once, plus code generation grounded in the corpus

When asked something outside the corpus, Claude will say so rather than confabulating — which is the whole point.

Tuning knobs

All in scripts/brain.py, overridable via env:

  • BRAIN_EMBED_MODEL — swap in a heavier embedding model when you need more recall
  • BRAIN_EXTRACT_MODELllama3.1:70b produces noticeably cleaner entity extractions if your machine can run it
  • CHUNK_SIZE / CHUNK_OVERLAP — bigger chunks for narrative docs, smaller for reference material

To wipe and re-ingest from scratch: rm -rf data/{chroma,graph.pkl,ingested.json} and re-run ingest.py.

What's next

The current build is intentionally simple. Obvious next steps if it earns its keep:

  • Multi-format support (Markdown, HTML, .docx, code repositories)
  • Entity resolution — merge aliases like "Visualforce" / "Visualforce pages"
  • Filtering noise nodes — the model occasionally returns the type label itself as an entity name
  • Hybrid search — combine BM25 with vector similarity for queries that depend on exact terminology
  • A daily ingest cron so dropping PDFs into sources/ Just Works without remembering to run a command

The point of this version is to get something useful in front of Claude with the smallest possible moving-parts count. From here, every improvement is incremental and optional.

Built locally with Ollama + ChromaDB + NetworkX. The graph above is rendered live from data/graph.pkl at the time this page was generated.

Comments