The Control Plane for Enterprise AI Coding
AI coding agents create velocity.
CodeLedger creates the verification, memory, and governance layer that makes that velocity enterprise-safe.
Logs are history. Ledger is intelligence.
Without the right context,
your AI agents improvise.
Give AI coding agents the context, memory, and governance they need to work safely inside real-world codebases.
Any coding task comes down to 7–10 files that actually matter. Without a control plane, your agents do their best against your entire codebase — shadowing changes your security team cannot see, breaking dependencies nobody catches, and making confident claims that are simply wrong. Every session. Every engineer. Compounding.
Built on a deep patent-pending portfolio in enterprise context engineering, ContextECF CodeLedger brings deterministic context selection, repo-native memory, policy-aware guardrails, and audit-ready intelligence to AI-assisted software development.
At 20,000 employees, this is not a developer quality problem. It is $1.38M in provable annual waste, growing audit exposure, and operational risk that has no owner.
No credit card required · 8 Design Partner slots · Founding-team attention
One control plane across every AI coding tool
The compounding problem
Five pains that get worse every quarter you wait
Each one is costly alone. Together they compound — because every new engineer, every new AI tool subscription, and every new quarter of unaudited AI output makes the next one harder to contain.
$1.38M
est. annual inference waste
You are paying for every wrong file your agent reads
28.7% of every token your agents process is context noise — wrong files, irrelevant history, dead code your codebase forgot about. For 2,000 engineers at $200/month in API costs, that is $1.38M a year in provable waste. The inference bill arrives monthly. The audit trail explaining it never does.
Benchmark-proven · 8 public repos · 40 tasks
0
signed records of what the agent changed
When compliance asks what the AI touched, you have no answer
Your security review, your board, your acquirer — each of them will ask what the AI agent changed last quarter. What files did it read? What did it skip? What evidence was checked? Without a signed audit trail, the current answer is: we do not know. That answer does not survive a procurement cycle.
Truth Audit Certificates address this directly
8–12 mo
avg new developer ramp time — unchanged by AI
AI tools do not shorten onboarding without shared context
AI coding agents start every session with a blank slate. They do not know your validated patterns, your naming conventions, or why you made that architectural decision three years ago. New hires get the same blank slate. The knowledge your senior engineers built over years lives nowhere the agent can use it.
Team Context Ledger → institutional memory that persists
3–4×
AI tools per engineer · zero shared governance
Three AI subscriptions. Three isolated contexts. No shared memory.
Your engineers are running Claude Code, Cursor, and Copilot in parallel. None of them share what your codebase has validated. Every tool maintains its own context in isolation. You are paying for three separate attempts at the same problem, with no way to govern, audit, or unify the output.
One control plane across all agents — agent-agnostic by design
47 days
avg enterprise detection time for shadow AI changes
Shadow changes bypass your entire review process
Without context control, agents touch files they should never see — infrastructure configs, security boundaries, shared utilities. No signal reaches your review process. The change merges, deploys, and surfaces as an incident weeks later. By the time it is caught, the blast radius is already wide.
Architecture guardrails in CI · 1,398 boundary violations caught in vscode/vscode alone
Compounding
the real cost
None of these problems stay contained
Shadow changes become technical debt. Bad context becomes bad patterns baked into your codebase. Knowledge loss accelerates with every senior departure. And every new AI agent subscription you add without governance multiplies the surface area for all of the above.
CodeLedger is the control plane that stops the compounding.
Four levers · one lifecycle
CodeLedger operates across every stage of how software gets built
Most tools optimize the agent. CodeLedger governs the system — at the moment of context selection, the moment of review, across every session, and for every engineer who comes after.
Immediate value
The agent starts informed. Not cold.
- Scores every file in the repo for the current task
- Selects the minimal, highest-signal context bundle
- Validates the task prompt against 101 quality signals
- Surfaces ⚠ Prompt health before the first line is written
28.7% token reduction · 100% top-5 stability
Protective value
Risk signal before the merge.
- Risk · Drift · Evidence Gaps on every PR
- Deterministic additive model — no AI in the scoring path
- Exact file and line where conventions were bypassed
- Appears in the PR comment where engineers already look
45% catch rate · 0 hallucinations · 11 PRs verified
Compounding value
Every session makes the next one better.
- Successful patterns promoted to institutional memory
- Feedback flywheel: deploy outcomes flow back to signal weights
- First-pass success rate climbs 62% → 80% over 8 weeks
- No cloud, no retraining — improves inside your environment
+18% first-pass rate · -13% rework · Week 1 → Week 8
Institutional value
When engineers leave, the knowledge stays.
- 5 persistent ledgers: truth, validation, ontology, structure, evidence
- New hires inherit proven patterns, not a blank slate
- Architectural invariants enforced across agents and sessions
- Evidence gates prevent low-confidence findings from becoming noise
8–12 mo ramp time · 25% improvement with shared context
CodeLedger is the only system that operates across all four stages of the engineering lifecycle — not just code generation.
The numbers for a 20,000-person company
What uncontrolled AI coding costs, line by line
Assumptions: 2,000 engineers actively using AI coding tools · $200/month average inference spend · 28.7% token reduction benchmark-proven across 8 public repos and 40 tasks.
Inference waste
2,000 engineers × $200/mo × 28.7% token reduction
$1,377,600 / yr
AI-driven rework cycles
15–25% of AI output requires rework without context control; 2,000 devs × conservative 2 hrs/mo
$3.6M / yr
New developer ramp gap
200 new hires/yr · 25% ramp improvement with shared context · $75k avg onboarding cost
$3.75M / yr
Compliance exposure
One AI-related security incident requiring external audit
$500k – $2M
How CodeLedger addresses each pain
Five capabilities. Five business outcomes.
Pain 1
Inference spend with no ROI line item
28.7% token reduction across 40 tasks on 8 public TypeScript repos. Every optimization is auditable — trim, hoist, retain, or skip — with a full trace. Zero omission incidents. 100% top-5 file stability.
See the benchmark →Pain 2
No audit trail for AI-generated code
Signed, tamper-evident certificates that record what the agent read, what changed, and what was skipped. Ready for procurement, security review, board reporting, or acquisition diligence. No source code leaves your environment.
See certificate tiers →Pain 3
Institutional knowledge evaporating
Validated patterns, accepted changes, naming conventions, and architectural decisions are persisted across sessions. Every agent — and every new developer — starts with the context your senior engineers built, not a blank slate.
Enterprise overview →Pain 4
Three AI tools with zero shared governance
One governance layer across Claude Code, Cursor, Codex, Kiro, Windsurf, and Copilot. Context selection, guardrails, and audit evidence work identically regardless of which tool any engineer is using.
How it works →Pain 5
Shadow changes and architecture drift
Risk, drift, and evidence-gap signals on every PR. Conditional gates run heavy checks only when risk is High — keeping CI fast by default. VS Code: 1,398 boundary violations caught in a single pass.
Governance planes →Where it lives
Local intelligence. GitHub governance. Executive visibility.
Zero cloud dependency by default. The intelligence lives on the developer machine. Governance runs in GitHub. Evidence surfaces in a self-hosted dashboard. Three zones. No external SaaS.
Zone 1
Developer Machine
.codeledger/
├── active-bundle.md ← current task context
├── memory/
│ ├── recent-truth.json
│ ├── ontology.json
│ ├── structural-trust.json
│ ├── evidence-gates.json
│ └── validation-ledger.json
├── patterns/ ← golden patterns
└── bin/
└── codeledger-standalone.cjsLocal-first. Air-gap capable. Zero cloud dependency. Works offline. Works in regulated environments.
Zone 2
GitHub Enterprise
.github/workflows/
├── codeledger-verify.yml ← CI on every PR
├── codeledger-pr.yml ← Risk/Drift/Evidence
└── codeledger-guard.yml ← release gate
.codeledger/team-ledger/
├── patterns/ ← shared golden patterns
└── merge-memory-records.jsonl
# Two-line integration:
- uses: codeledgerECF/codeledger@v0.10.19
with:
github-token: ${{ secrets.GITHUB_TOKEN }}Governance runs here. Teams share patterns here. The PR signal appears where engineers already look.
Zone 3
Engineering Dashboard
[your-org].github.io/engineering ├── /overview ← Architecture Health Score ├── /value ← hours saved, dollar impact ├── /integrity ← Architecture / Impl / Release ├── /engineering ← agent scorecards ├── /fleet ← cross-repo (Enterprise) └── /evidence ← drill-down event log
Self-hosted on GitHub Pages. Board-ready metrics with evidence behind every number. SIEM-ready audit export.
Local is the source of truth. GitHub is the governance layer. The dashboard is the evidence surface. One system, three zones, no cloud.
For enterprise rollout
One platform. Five governance planes. Every stakeholder covered.
The enterprise dashboard organizes CodeLedger into five outcome planes, each with access-controlled depth and exportable evidence for the teams that need it — engineering, security, compliance, finance, and leadership.
1
Context
Engineering
The right files, every time
2
Verify
Security
Risk caught before it merges
3
Govern
Compliance
Signed evidence on demand
4
Fleet
Leadership
Visibility across every agent
5
Learn
Platform
Improves with your codebase
Evidence, not marketing copy
By the numbers
Every metric below maps to a real test, a real repo, or a real session record. No projections. No projections dressed as benchmarks.
Context-Compiler token reduction
28.7% weighted avg
40-task benchmark · 8 public TS repos
Top-5 file stability
100%
Top files retained after optimization on every task
First-pass success rate
>70%
Tasks completed without agent retry
Token reduction vs. full repo
>90%
Bundle tokens vs. estimated full-repo context
Bundle recall
>60% (growing)
% of touched files that were in the bundle
Rework reduction
−13% over 8 wks
CIC failure rate trend, controlled environment
Release surfaces verified
7/7 (168/168)
release-verify propagation check
Cross-repo PR scoring
Nightly
Live PRs across vercel/next.js · facebook/react · postgres/postgres
PR-Review-Intel records captured
Every run
Per-PR ledger with stable schema, audit-grade
62% → 80%
First-pass success rate
Week 1 → Week 8
24% → 11%
Rework rate
Week 1 → Week 8
18% → 64%
Pattern reuse rate
Week 1 → Week 8
Part of the ContextECF platform
Enterprise Context Fabric — the intelligence layer for every agent
CodeLedger is the engineering module of a broader platform: the ContextECF Enterprise Context Fabric. Where CodeLedger governs AI coding agents and repos, the Enterprise Context Fabric extends governed context across CRM, support, sales, ops, and every system your teams work in — powered by the same deterministic, audit-ready primitives.
Design Partner Program · 8 founding slots
Save over $1M in year one.
Shape the governance standard before it's set.
Design partners get founding-team attention, pricing locked at $45/seat for 12 months, early access to OVPI signal, and direct input into the roadmap while decisions still bend. At 2,000 engineers and $200/month in inference spend, CodeLedger's 28.7% token reduction alone saves $1.38M annually — before we account for rework, onboarding, or compliance overhead.
8
founding slots total
$45
per seat/month, locked 12 mo
$1.38M
yr-1 inference savings (2k engineers)
6
structured calls with founding team
Independent evidence
Run blind against public repos you can inspect yourself
Selection criteria were written before scoring. Results include catches, correct silences, and the one domain where we do not yet perform — because hiding misses is not how trust is built.
nestjs/nest
2 controlled tasksConsistent 25–27% reduction on a large TS framework
Two tasks across a large TypeScript framework monorepo — rate limiting middleware and DI container refactor — both scored PASS with 25–27% token reduction and top-5 file stability. No crashes, no omission incidents.
Full study →vercel/next.js
5 PRs blind-scoredCaught the revert before it happened
PR #93071 was flagged Medium WARN with dependency_manifest_changed and cross_package_boundary drivers. It was reverted 3 days later by #93226: "breaks lerna — cannot resolve catalog: references."
See the revert →facebook/react
5 PRs blind-scoredFlagged the gap before the post-merge move
PR #36253 was flagged production_change_without_tests + uncovered_failure_branch. The entire implementation was relocated to the React Native repo days later "due to discovered bugs and iteration challenges."
See PR #36253 →postgres/postgres
7 controlled tasksHonest: 1.1% recall on cold-start C
Our model is tuned for JS/TS monorepos. On a 5,000-file C codebase with no prior ledger, average bundle recall is 1.1%. We publish the full study because calibration transparency is the product.
Full study →Latest run · 3 repos · Docker verified
| Repo | Task | Token reduction | Quality |
|---|---|---|---|
| nestjs/nest | Add rate limiting middleware | 8,777 → 6,372(-27.4%) | PASS |
| nestjs/nest | Refactor DI container async providers | 8,770 → 6,540(-25.4%) | PASS |
| vercel/next.js | Fix hydration mismatch in app router | 8,247 → 6,021(-27.0%) | PASS |
| prisma/prisma | Add field type to schema parser | 2,492 → 2,492(+0.0%) | WATCH |
| prisma/prisma | Add PostgreSQL JSON query operators | 8,521 → 7,268(-14.7%) | WATCH |
WATCH = no trim opportunities found or dense cross-package types — not a failure. Nest & Next.js: consistent 25–27% reduction, top-5 stable.
11 PRs · 45% catch rate · 0 hallucinations · pre-registered selection criteria
→ Full field-tests reportContext-Compiler benchmark · 2026-05-05
28.7% weighted token reduction · 100% top-5 file stability · 0 omission incidents · 8 public TS repos · 40 tasks · merged CLI
What it looks like in practice
$ codeledger activate --task "add rate limiting to the payments API" ✔ Scanned 1,847 files in 2.3 s ✔ Context bundle ready (18 files · ~8,100 tokens · 28.3% below baseline) Ranked files: 0.91 src/api/payments/router.ts keyword_match, centrality 0.88 src/middleware/auth.ts dependency_depth, churn 0.85 src/api/payments/service.ts keyword_match, recent_touch 0.79 tests/payments/router.test.ts test_relevant ... Confidence: HIGH (0.94) · Top-5 stable · 0 omission flags
Ready to govern your AI coding program?
Start with the slot that controls the most.
Design partners get locked pricing, founding-team attention, and measurable inference savings from day one. Enterprise trial gives you 30 days to run the benchmark against your own codebase.
No credit card required · 8 Design Partner slots · Local-first, no source code uploaded