I work on a medium-sized Java codebase. Multiple libraries, multiple modules, service boundaries that are fuzzy. Everything talks to everything thru different methods (gRPC, Kafka, etc). The classic spaghetti western.
Like everyone else, I've been using AI tools to try to make my mundane tasks easier. And they're great for small, contained questions. But ask something that requires understanding the actual flow of the system (i.e. "why is this endpoint slow?" or "where does this field get populated?") and it falls apart. The AI starts reading files, lots of files, gets overwhelmed, and either hallucinates an answer or gives you something so generic it's useless. Many times it is just plain wrong.
The things I tried
I thought maybe the problem was context. So I wrote better READMEs. Documented the architecture. Created onboarding guides specifically designed to be AI-readable.
It helped a little. But the hallucinations didn't stop. The "context rot" didn't improve that much. The fundamental problem remained, these tools are trying to parse raw code and figure out what matters on the fly. Every. Single. Time.
Where LLMs actually shine (and don't)
I'm not an eminence here, but my two cents:
LLMs are really good at semantic understanding. Give them a method signature and they can tell you what it probably does. Give them structured context and they compress it beautifully. Pattern recognition, summarization, explaining business logic... all great.
LLMs are terrible at exhaustive traversal. Multi-hop searches through a codebase (maybe not as much if you have it vectorized, but still). Finding every caller of a function. Building a complete picture of how data flows through the system. They miss things, they get lost and then they guess.
You know what's good at exhaustive traversal? Static analysis. Compilers. Tools that have been doing this for decades.
The idea I'm exploring
Use static analysis to do what it's good at. Build complete call graphs, trace execution paths, find every method that touches a piece of data. Deterministic, exhaustive, boring.
Then use LLMs to do what they're good at — add semantic meaning, business context, human-readable descriptions. "This method handles premium user sorting" instead of just "this method takes a List and returns a List".
Preprocess the codebase into something an LLM can actually navigate. Not raw code, but a semantic map with the hard work already done.
The raw idea is that you can drop an annotation or comment on an entry point method like
@anchor
/user/compute-something-complicated
and the tool does the traversal and LLM does the enhancing. Next prompt, the LLM will look at that generation first.