Prediction-versus-Reality

Repository Intervention Simulator

Observe a repository world, predict an intervention's consequences before acting, then hold that prediction against an observed reality.

This is a simulation-first, advisory-only workflow over deterministic mock data. CodeWorld is not a trained world model and does not perform real machine learning here. The simulator observes a mock repository world, predicts intervention consequences before action, and compares prediction with a mock observed reality. It does not write to a repository, run shell commands, contact a Git API, or trigger a deployment.

Workflow

Simulate before write

Five steps turn the repository-as-world thesis into a concrete, reviewable workflow — none of which mutate a real system.

01

Observe

Read a mock repository world: build status, fragile zones, dependency graph, history, and risk.

02

Propose

Select a predefined intervention with intent, scope, benefit, failure modes, and rollback posture.

03

Predict

Show a deterministic prediction before action: files, zones, tests, build, lint, runtime, confidence.

04

Observe Reality

Reveal a mock observed outcome: actual files, tests, build, lint, runtime, and console status.

05

Reconcile

Compare prediction with reality, surface mismatches, preserve evidence, and record the human review state.

Step 01 — Observe

The repository world

A static, typed snapshot of a repository-like environment. Mock data only — not a live capture.

Repository WorldMock State

codeworld-observatory

main

Build Statuspassing48 verification surfaces (mock)

Dependency Graph Summary

ui-primitivesObsCommon + SectionHeader consumed by nearly every panel.
6 modulestight
research-exportManifest builder, types, and API route move together.
9 modulesmoderate
observatory-panelsRender persisted research artifacts; sensitive to schema drift.
21 modulesmoderate
routing · ui-shellRoutes and navigation; additive changes are low-risk.
4 modulesloose
ci · docsVerification workflow and governance documentation.
5 modulesloose

Fragile Zones

  • research-export manifest schema (legacy artifacts can drift)
  • observatory panels reading persisted JSON
  • shared UI primitives consumed across many surfaces
  • client/server component boundary in the App Router

Risk Zones

research-exportelevated
Persisted manifests predate the current schema; renderers must defend.
ui-primitivesmoderate
A change here has a wide blast radius across panels.
routing · ui-shelllow
New routes are additive and easily reverted.

Recent Change History

f45dfc4
Add release governance, CI, verification docs, and favicon.ci · docs · app/icon.svg
4f4210d
Harden evidence status rendering against undefined replay summary.observatory-panels · research-export
977d536
Publish live deployment link.docs
517df6e
Introduce the Quantum Research Annex.routing · research-annex

Open Assumptions

  • Verification is type-check, lint, and build — there is no live unit-test runner in the web app.
  • Persisted JSON artifacts may lag the current TypeScript types.
  • Vercel deploys are immutable; file ordering by mtime is unreliable there.
  • Human review is the authority layer above every proposed intervention.

Steps 02–05 — Propose · Predict · Reconcile

Intervention simulator

Select a predefined intervention. The prediction is shown before action; the observed reality and the prediction-versus-reality ledger follow.

Intervention Proposaladditive · schema

Add a typed evidence ledger export

Introduce a typed JSON export manifest (builder + API route + types) so research evidence can be reproduced and reviewed externally.

Rollback PosturestraightforwardHuman Approval Required

Affected Files

  • lib/services/export-manifest-builder.ts
  • app/api/research/export/route.ts
  • lib/types/research-export.ts

Affected Dependency Zones

research-exportapi-routestype-system

Expected Benefit

Reproducible, typed evidence artifacts that external reviewers can inspect without running the app.

Possible Failure Modes

  • Schema drift between persisted manifests and current types.
  • Serialization gaps for optional fields.

Prediction Before Action

Confidence82%
Files Touched3
Dep Zones2
Tests Likely Fail0
Buildrebuild-required
Lintnone
Runtime Risknegligible

Uncertainty

Low–medium — additive surface with a well-understood type boundary.

Recommended Verification

npm run type-checknpm run build
Human Approvalrequired

Prediction Before Action — a deterministic mock estimate shown prior to any change. CodeWorld is not a trained world model; this illustrates simulate-before-write discipline, not a learned prediction.

Observed Reality

Files Touched3
Tests Failed0
Consoleno console errors
BuildpassLintpassRuntimeclean

Deployment

not deployed (simulation only)

Notes

Prediction matched observation. Additive change, no surprises.

Prediction-vs-Reality LedgerEvidence Preserved

Add a typed evidence ledger export

evidence · sim-ev-typed-evidence-export

Prediction Qualityaccurate
DimensionPrediction Before ActionObserved RealityMismatch Surface
Files touched33
alignedScope predicted exactly.
Test impact0 likely to fail0 failed
alignedTest impact predicted exactly.
Risk / runtimenegligible riskclean runtime
alignedRuntime behaviour matched the risk posture.
Build impactrebuild-requiredpass
alignedBuild outcome consistent with prediction.
Lint impactnonepass
alignedLint outcome consistent with prediction.

Unresolved Uncertainty

Low–medium — additive surface with a well-understood type boundary.

Human Review Statereviewed · approved · not executed
PostureAdvisory Only
Repository MutationNone

Evidence preserved · advisory only · no repository mutation. This ledger records a simulated prediction against a mock observed outcome. It does not write to any repository, run shell commands, or trigger a deployment.

Governance

Human authority layer

The simulator models consequences; it never bypasses review. Human approval sits above any real intervention.

Human Authority LayerAdvisory Only

Human review sits above automation

No Repository Mutation
The simulator does not write to a repository.
No shell commands are executed.
No production deployment is triggered.
No real Git, broker, or financial API is contacted.
Human approval is required before any real intervention.

The simulator exists to model consequences, not to bypass review. Human approval is the authority layer above every proposed intervention.

Direction

Repository-as-world thinking

Why a prediction-versus-reality ledger belongs at the centre of agentic software engineering.

Inspired by broader world-model research: before acting in a complex environment, a system should represent the world, imagine future states, estimate the cost of intervention, compare prediction with reality, and remain accountable when its prediction fails. In CodeWorld terms, the environment is the repository, future-state prediction is intervention-consequence prediction, and cost is files, dependency zones, tests, rollback risk, and uncertainty.