Getting Started

How context selection works

CodeLedger selects files deterministically — no LLMs, no embeddings, no sampling. Every selection decision is traceable to a score.

The selection pipeline

When you run codeledger activate --task "...", CodeLedger runs four stages in sequence:

  1. 1
    ScoreEvery file in the repo is scored against the task using a weighted sum of six factors (below).
  2. 2
    SelectFiles are ranked by score. The top-N files that fit within the token budget (default: 8,000 tokens, max 25 files) are selected.
  3. 3
    Shadow expandUp to 3 additional shadow files are added from the co-commit graph — files that historically change alongside the selected set.
  4. 4
    ExcerptFiles under 200 lines are included in full. Larger files receive tiered excerpts: the highest-scoring spans are expanded first.

Scoring factors

The composite score is a weighted sum of six signals. Weights are configurable in .codeledger/config.json under selector.weights.

Keyword match

weight 0.30

Compound phrases extracted from your task prompt are matched against file content. Exact phrases score higher than partial matches. The phrase index is built during scan and updated incrementally.

Structural centrality

weight 0.25

Files that import — or are imported by — many other files are scored higher. High-centrality files are more likely to be relevant to cross-cutting changes. Centrality is derived from the dependency graph.

Churn history

weight 0.20

Files that change frequently tend to be relevant to active work. Churn is weighted by recency using a 60-day half-life — recent commits count more than old ones.

Recent touch

weight 0.15

Files you have modified recently in this session, or files touched in the last few commits on the current branch, receive a boost. This keeps the bundle anchored to your current working context.

Test proximity

weight 0.07

Files that have discovered test coverage, or that are co-located with test files, receive a small boost. The test map is built during scan from naming conventions and import analysis.

Shadow affinity

weight 0.03

Files that historically co-commit with selected files are eligible for shadow inclusion. Co-commit affinity decays over time and is penalised for large commits to reduce noise.

Token budget

The default budget is 8,000 tokens and 25 files. CodeLedger stops adding files when either limit would be exceeded by the next candidate.

Override either limit in your config:

// .codeledger/config.json
{
  "selector": {
    "default_budget": { "tokens": 12000, "max_files": 30 }
  }
}

Or per-activation with flags: --budget-tokens 12000 --budget-files 30.

Shadow files

Shadow files are additions beyond the scored top-N. They come from the temporal co-commit graph — pairs of files that have historically been committed together. If file A is in the bundle and file B has changed alongside A at least 3 times, B is a shadow candidate.

0.30
Min affinity
Minimum co-commit affinity score to qualify
3
Min co-commits
Minimum historical co-commit count
3
Max shadows
Maximum shadow files added per bundle

Intent sufficiency check

Before writing the bundle, CodeLedger runs an Intent Sufficiency Check (ISC) on your task prompt. If the task is vague or contradictory, you'll see a prompt health warning before activation completes — so you can refine the task rather than getting a low-quality bundle.

⚠ Prompt health: contradiction in task detected.
  Consider clarifying scope before activating.