Building Docdex: A Local-First Structural Index for LLM Coding Workflows

I use LLMs every day for software development. They’ve become part of my thinking process — refactoring, architecture validation, API design, debugging. They’re incredibly powerful.

But they’re stateless.

And that statelessness costs tokens.

A large portion of model usage isn’t spent solving the actual problem. It’s spent rediscovering the repository — re-reading files, inferring structure, guessing dependencies, reconstructing context. Every session starts from scratch.

That inefficiency is what led me to build Docdex.


The Core Idea

The goal was simple:

Use local and free resources to digest the codebase as much as possible, so the LLM can focus on reasoning instead of rediscovery.

Instead of feeding raw files to a model and hoping it understands structure, I wanted a persistent local layer that already knows:

So the model doesn’t waste tokens reconstructing structure — it receives structured, digested information.


Why Rust?

I started building Docdex in Rust and never looked back.

The requirements were clear:

Rust was a natural fit.

The first version was purely a document indexer built on Tantivy, because I couldn’t find a local tool that provided proper ranked full-text search with scoring — something closer to a real search engine rather than simple grep-style matching.

That alone was already useful.

But it didn’t stop there.


From Document Indexer to Structural Intelligence Layer

Once document search worked well, I wanted the same capabilities for code.

Plain string search isn’t enough. It’s noisy and literal.

So I added structured code indexing.

2. AST-Based Symbol Indexing

Docdex parses the AST and builds a normalized representation of symbols:

This allows precise definition resolution instead of substring matching.

3. Directed Symbol Graph (Impact Analysis)

This was the turning point.

I added a directed dependency graph so Docdex could answer questions like:

Instead of asking the LLM to “guess” impact from context, Docdex computes a bounded transitive closure over the graph.

It’s deterministic. It’s explainable. It produces explicit paths.

This dramatically improves the quality of reasoning when paired with LLMs.


The Daemon Architecture (Why I Refused 10 Tools)

One thing that frustrated me early was MCP/SSE tooling behavior.

Each tool call:

Maintaining multiple independent AI tooling services becomes fragile fast.

So I made a deliberate decision:

Docdex runs as a single long-lived daemon.

It:

No process explosion. No fragile cleanup. Just one stable background service.

In practice, this model is far more reliable.


Memory: Repo-Level and Agent-Level

LLMs forget everything between sessions.

So I added two memory layers:

Repo Memory

Stores project-level facts:

Agent Memory

Stores persistent preferences for the LLM:

This allows workflows to remain consistent across sessions without repeating instructions.


Web Search — But Delegated Locally

Eventually I realized the LLM also needed external context.

So I added web search.

During setup, Docdex can:

Search results are scored locally before reaching a paid model.


Local Model Delegation (Reducing Paid Tokens)

Docdex installs Ollama automatically if not found.

It pulls:

The idea is simple:

Let expensive frontier models focus on complex reasoning. Delegate simple tasks to local models.

Examples:

This noticeably reduces token costs in real workflows.


It’s Also a Standalone Tool

Docdex is not glued to AI workflows.

It works as:

Even without an LLM, it’s useful.


Language Support

Impact graph reasoning currently supports:

The structural layer normalizes these into a unified dependency model.


Installation

npm i -g docdex
docdex setup

Website: https://docdex.org Repository: https://github.com/bekirdag/docdex

It’s fully open source and free to use. Web search API keys are optional.


What Changed in My Workflow

After integrating Docdex into daily development:

The LLM feels “more aware” of the project because it’s no longer guessing structure.


Where This Is Going

I plan to continue developing Docdex.

Potential future direction:

But the core will remain open.


Open Question

I’m particularly interested in this discussion:

Docdex is my attempt at answering those questions in practice.

If you are building serious AI-assisted development workflows, I’d be interested in your thoughts.