Skip to content

Test Matrix

The complete grid of sandbox scenarios. Every row links to a full template.md-shaped scenario page with steps, expected outcome, and run records.

The grid

#test-idStageIDELanguageFixtureStatus
01install-cursor-nodeinstallCursorNode 20 + TSsandbox-node-ts
02install-claude-pythoninstallClaude CodePython 3.12sandbox-python
03install-codex-goinstallCodex CLIGo 1.22sandbox-go
04install-aider-rustinstallAiderRust 1.78sandbox-rust
05boot-cursor-nodeboot (first demand)CursorNode 20 + TSsandbox-node-ts (after 01)
06boot-claude-pythonboot (first demand)Claude CodePython 3.12sandbox-python (after 02)
07update-cursor-nodeupdate (v0.1.0 → v0.1.1)CursorNode 20 + TSsandbox-node-ts (after 01)
08update-cli-without-cliupdate with --without=cliCursorNode 20 + TSsandbox-node-ts (after 01)
09sync-cleansync (no drift expected)CursorNode 20 + TSsandbox-node-ts (after 01)
10sync-modifiedsync (drift detected)CursorNode 20 + TSsandbox-node-ts (after 01, hand-edit injected)
11uninstall-preserveuninstall (preserve ledgers)Claude CodePython 3.12sandbox-python (after 02)
12uninstall-archiveuninstall (archive ledgers)CursorNode 20 + TSsandbox-node-ts (after 01)

Status legend: ✅ passing · ❌ failing · ⏳ pending. Status is updated per release: each scenario must pass against the candidate manifest version before the framework is tagged.

Coverage view

By stage

Stage# of scenariosTest IDs
install401–04
boot205, 06
update207, 08
sync209, 10
uninstall211, 12
Total12

By IDE

IDE# of scenariosTest IDs
Cursor701, 05, 07, 08, 09, 10, 12
Claude Code302, 06, 11
OpenAI Codex CLI103
Aider104
Continue / Windsurf0(deferred — see KNOWN-ISSUES.md)

By language

Language# of scenariosTest IDs
Node + TypeScript701, 05, 07, 08, 09, 10, 12
Python302, 06, 11
Go103
Rust104

Dependency graph

Some scenarios reuse the post-install state of an earlier scenario. Run them in this order if you want to chain (otherwise each scenario sets itself up from the fixture):

What this matrix does not cover (yet)

GapWhy deferredTrigger to add
Continue / Windsurf IDE coverageNo active adopter; would be a synthetic testFirst adopter on either IDE
Java / Kotlin / Swift / C++ fixturesSee fixtures/README.mdFirst adopter
archon doctor deep-dive scenarioDoctor is a wrapper over check + structural; covered transitively in 09Behavior diverges from sync
Multi-agent / parallel-write raceSingle-agent invariant is part of soul.md; race testing is out of scopeIf invariant is relaxed
Cross-OS matrix (Linux × macOS × Windows × WSL)First-party CI runs on Linux; macOS/Windows runs are spot-checked manuallyBefore declaring 1.0.0

These gaps are recorded so the matrix is honest about what "sandbox-tested" means today.

Released under the Apache-2.0 License.