Anchors: first test

Posted on 2026-05-05 :: ai tooling learning

Quick update on anchors. I've been cooking.

The original idea was static analysis + LLM enhancement, outputting YAML files that the LLM could read before answering questions about a codebase. That worked, but it was clunky. The YAML files were big, slow to generate, and didn't scale well.

So I pivoted. Instead of YAML, anchors now builds a SQLite database with the full call graph already embedded. Methods, call relationships, file locations, etc. The LLM doesn't read files anymore, it queries the graph directly.

This is preliminary but the results surprised me enough that I wanted to write about it.

The setup

I pointed anchors at a production TypeScript codebase I worked on. Around ~600 files and ~180k lines of code.

I had a known bug as ground truth (a batching issue where defaulting to chunks of 1 was worsening performance).

The prompt I used was intentionally vague:

"We're getting reports that [feature x] loading is slow for power users with lots of [y]."

Just the kind of thing someone might actually say in a bug report.

What I expected

I expected anchors to be faster and cheaper. The call graph should let the LLM skip the grep-and-wander phase and go straight to the relevant code.

That's not what happened.

The results was encouraging and weird:

Metric	Without Anchors	With Anchors
Found the correct bug?	No	Yes
What it found	Different bug (missing ORDER BY)	The batching bug
Cost	$0.09	$0.36
Wall time	1m 19s	3m 10s
Files read	7	5
Approach	grep → read files	query graph → trace → read files

Without anchors, the LLM grep'd for the feature, found a different (valid!) performance issue (a missing ORDER BY clause in a SQL query) and called it a day.

With anchors, it queried the call graph, traced the path systematically, and found the actual batching bug. It even verified the lodash chunk() default behavior by running Node to confirm chunks of size 1. Slower and more expensive but at least the correct bug.

What surprised me

I might be wrong about speed and cost. Anchors was 4x more expensive and took twice as long. Not sure if it has to do with MCPs and the context window they use. But it found the bug, not just a bug. That's a different kind of accuracy.

The LLM without anchors did what LLMs do, which is finding something plausible quickly and stop. The one with anchors traced the actual call chain from the entry point to the storage layer.

One test proves nothing tho. The prompt was vague enough that "finding a different bug" isn't really wrong...

What's next

Probably some more tests, and see how it evolves. I might try to create some automated eval process and release it in some mid sized open source project