hi guys

late February this year I was thinking of cool projects to pursue and had an idea: “why does anyone use regular RAG for code? AST would be so much better?”

so in my ignorance I decided to build an AST driven RAG engine — local first MCP being the goal. ASTRA is now public and on github

deciding stack

as I’ve never touched RAG before (outside from prior wanting to do it for a startup which is currently on pause) I wanted to get expert opinions on the best way to do this. three aistudio tabs later I ended up with:

tree-sitter for language parsing (rust, python, js/ts)
petgraph for the call graph
fastembed + ONNX Runtime running BAAI/bge-base-en-v1.5 for local 768-dim embeddings (feature-gated, more on this later)
bidirectional A* for graph traversal, biased by cosine similarity
rayon for parallelized cold-start indexing
notify + tokio for filesystem watching and incremental updates
MCP server over stdio

theoretically this is the ideal stack unless I wanted to start rolling code myself which would’ve made this project take much longer than a month or two, and would probably run worse.

after defining the concept I started working on names because of course thats the professional way to do things, and thought of astra (greek/Latin for stars/navigation), and shortly realized it’s literally a perfect abbreviation for Abstract Syntax Tree Retrieval Augmentation so that’s what we’re rolling with 🔥

concept

RAG pulls semantically matched data, AST is pointers of data and (learned after building this) people have already used AST as a RAG source, or at least the symbols and paths to them.

for some reason this didn’t cross my mind and instead I went for path traversal, opting to bias an A* search through the callgraph to find a target. basically: you ask “where does auth token validation happen” and it finds the right functions even if nobody wrote the word “auth” or “token” in them. XY-problem resolution but for code search

it supports rust, python, and javascript/typescript right now

prototype 1 — late February

4.5k loc in a single commit. courtesy of codex

this version of astra actually didn’t even have real embeddings, well it kinda did? it had feature-hashing text extraction and that was about it. it could match keywords but it couldn’t actually do semantic search. this version sucked, but it did establish the full architecture: tree-sitter parsing, petgraph call graph, bidirectional A* search, MCP server, filesystem watching, and 542 lines of e2e tests. all in one shot 😭

two hours after the initial commit I realized codex has ZERO foresight unlike opus so I went back and added max_results params, Content-Length limits, HashMap indexing, MCP query/header validation hardening, NDJSON framing and some other misc stuff

prototype 2 — early March

embeddings were brought in, ripped out the feature-hashing and replaced it with BAAI/bge-base-en-v1.5 via ONNX Runtime through fastembed. 768-dimensional dense embeddings with actual semantic understanding. 247 lines deleted, 74 added. less code, infinitely better

except now nothing worked. the old embedder was 384-dim, the new one was 768-dim, and I had hardcoded 384 in the VectorStore, the MCP unit tests, and the e2e tests. all 8 MCP tests failed. the e2e test failed. the docstring still said “all-MiniLM-L6-v2” when the code was using BGE-base. cross-file call edges were being blocked by a same_file filter that neutered the entire cross-file traversal

the commit messages from this period were mostly keyboard smashes and increasingly desperate pleas for things to compile

also added GPU indexing support in here somewhere. you know how it goes

persisted storage was brought in too (and put in a dotfolder)

this version seemed to function properly but there was no real confirmation if it did or didn’t under load, so I needed benchmarks

also around this time I did a massive refactor — the codebase was a mess after the embedding swap. 563-line parser.rs, 465-line mcp.rs, use super::* everywhere. decomposed everything into proper module directories with explicit imports. the codebase went from “chaotic prototype” to “something I wouldn’t be embarrassed to open-source”

benchmarking — XYbench

I went through a few benchmark attempts that didn’t pan out (SWE-bench lite hitrate checking, a zero-shot patch accuracy bench with gpt 5 mini) before landing on something that actually made sense

XYbench is a curated subset of SWE-bench with 50 cases that are genuine XY problems — issues where the reported symptom points to the wrong place in the codebase. the idea is that ASTRA can navigate from where the problem supposedly is to where it actually is. this is exactly the kind of thing graph-biased semantic search should excel at

the benchmark itself was also broken at first — the grep/ripgrep keyword extraction was wrong, bidirectional path matching was wrong, file deduplication was wrong, and the top_k default was wrong. so every comparison I’d been making against grep and ripgrep was meaningless

after fixing all that, there were many more rounds of “run benchmark, tweak something, run benchmark again” than I’d like to admit

ASTRA was actually miserable on XYbench at first, only barely passing standard RAG. the problem was that it would target massive functions and output over 500k tokens to the model over MCP — some files (matplotlib) were inflating context to 136K+ tokens per case

so I (codex) added a 16K CONTEXT_TOKEN_BUDGET per case, fixed the token estimation from len/4 to len/3 (more accurate for code-heavy BPE), and most importantly took out the full object retrieval by default — added skeleton context instead, which returns just the signature + leading docstring/comments capped at ~20 lines. the terminal node still gets full source. you can still pass return_nodes: true and uncap_file_size: true if you actually want everything

the skeleton overhead of the discovery path ends up averaging ~15 tokens per case (730 total skeleton tokens across 50 benchmark cases) putting us in the green for accuracy to result token count by a large margin

xybench results

this is where ASTRA actually beat grep and ripgrep on the XY oracle file hit-rate for real, not on broken benchmarks. also found and fixed a bug in the top-k vector search around this time — the partition index was off by one. optimized it and then immediately had to fix it again 5 minutes later

openrouter support

one thing that kept bugging me was that the local embedding setup (fastembed + ort) makes the binary heavy and the compile slow. not everyone has a good GPU, and not everyone wants to wait for a 768-dim model to download and initialize on first run

so I added OpenRouter as an alternative embedding provider, behind a cargo feature flag. the whole embedding layer is now trait-based with a build_embedder() factory that picks the right backend at compile time:

cargo install --path . — default, compiles with local BGE-base via ONNX (same as before)
cargo install --path . --features cuda — local + CUDA GPU acceleration
cargo install --path . --no-default-features --features openrouter — lightweight cloud build, skips ONNX entirely

the openrouter path compiles almost instantly and the binary is way smaller. you just set ASTRA_EMBEDDING_PROVIDER=openrouter and OPENROUTER_API_KEY in your MCP config and it works

for sake of convenience, instead of hardcoding a dim size, the OpenRouterEmbedder hits the /api/v1/embeddings/models endpoint on init to validate that your chosen model actually supports embeddings, then sends a single “ping” embedding to measure the output dimensionality. so it works with any embedding model on openrouter (openai/text-embedding-3-small, whatever else they add) without code changes

all the old hardcoded EMBEDDING_DIM constants got ripped out and replaced with a runtime dim() method on the Embedder trait. this was one of those refactors where you change one constant and then 8 files break

how it actually works

the pipeline is:

parse — tree-sitter walks your source files and extracts symbols (functions, methods, classes, structs, etc.) with their bodies, call sites, and parent scopes
graph — petgraph builds a directed call graph. symbol A calls symbol B → edge. methods get ContainedIn edges to their parent class/struct
embed — each symbol body gets embedded via the configured provider (BGE-base locally or any OpenRouter embedding model). the Embedder trait abstracts this so the rest of the pipeline doesn’t care where the vectors come from
search — bidirectional A* walks the graph from top-k entry points. the heuristic uses (1 - cosine_similarity) as cost, so it naturally gravitates toward semantically relevant nodes. includes a “teleport” mechanism for when local graph traversal gets stuck
serve — MCP server over stdio, three tools: astra_semantic_path_search, astra_structured_path_search, and astra_semantic_rag_search (regular chunk-based search without graph traversal for when you just want the most relevant symbols)

closing

all that said, ASTRA is now open source on github and is gladly accepting PRs if anyone wants to send over a patch to optimize it further and or fix an oversight of mine, feel free to! something I’ve been thinking about is adding BM25 but I think that’s wayyy over my skill level right now

it’s MIT licensed and works with claude desktop, cursor, opencode, or anything that speaks MCP over stdio

notes

openrouter embedding support was last minute, no guarentee it’s stable rn but it should function fine if you either don’t want to wait for embeddings to generate or if you don’t have a good GPU