Prediction-versus-Reality

Repository Intervention Simulator

Observe a repository world, predict an intervention's consequences before acting, then hold that prediction against an observed reality.

This is a simulation-first, advisory-only workflow over deterministic mock data. CodeWorld is not a trained world model and does not perform real machine learning here. The simulator observes a mock repository world, predicts intervention consequences before action, and compares prediction with a mock observed reality. It does not write to a repository, run shell commands, contact a Git API, or trigger a deployment.

Workflow

Simulate before write

Five steps turn the repository-as-world thesis into a concrete, reviewable workflow — none of which mutate a real system.

Observe

Read a mock repository world: build status, fragile zones, dependency graph, history, and risk.

Propose

Select a predefined intervention with intent, scope, benefit, failure modes, and rollback posture.

Predict

Show a deterministic prediction before action: files, zones, tests, build, lint, runtime, confidence.

Observe Reality

Reveal a mock observed outcome: actual files, tests, build, lint, runtime, and console status.

Reconcile

Compare prediction with reality, surface mismatches, preserve evidence, and record the human review state.

Step 01 — Observe

The repository world

A static, typed snapshot of a repository-like environment. Mock data only — not a live capture.

Repository WorldMock State

codeworld-observatory

main

Build Statuspassing48 verification surfaces (mock)

Dependency Graph Summary

ui-primitivesObsCommon + SectionHeader consumed by nearly every panel.

6 modulestight

research-exportManifest builder, types, and API route move together.

9 modulesmoderate

observatory-panelsRender persisted research artifacts; sensitive to schema drift.

21 modulesmoderate

routing · ui-shellRoutes and navigation; additive changes are low-risk.

4 modulesloose

ci · docsVerification workflow and governance documentation.

5 modulesloose

Fragile Zones

research-export manifest schema (legacy artifacts can drift)
observatory panels reading persisted JSON
shared UI primitives consumed across many surfaces
client/server component boundary in the App Router

Risk Zones

research-exportelevated

Persisted manifests predate the current schema; renderers must defend.

ui-primitivesmoderate

A change here has a wide blast radius across panels.

routing · ui-shelllow

New routes are additive and easily reverted.

Recent Change History

f45dfc4

Add release governance, CI, verification docs, and favicon.ci · docs · app/icon.svg

4f4210d

Harden evidence status rendering against undefined replay summary.observatory-panels · research-export

977d536

Publish live deployment link.docs

517df6e

Introduce the Quantum Research Annex.routing · research-annex

Open Assumptions

Verification is type-check, lint, and build — there is no live unit-test runner in the web app.
Persisted JSON artifacts may lag the current TypeScript types.
Vercel deploys are immutable; file ordering by mtime is unreliable there.
Human review is the authority layer above every proposed intervention.

Steps 02–05 — Propose · Predict · Reconcile

Intervention simulator

Select a predefined intervention. The prediction is shown before action; the observed reality and the prediction-versus-reality ledger follow.

Intervention Proposaladditive · schema

Add a typed evidence ledger export

Introduce a typed JSON export manifest (builder + API route + types) so research evidence can be reproduced and reviewed externally.

Rollback PosturestraightforwardHuman Approval Required

Affected Files

lib/services/export-manifest-builder.ts
app/api/research/export/route.ts
lib/types/research-export.ts

Affected Dependency Zones

research-exportapi-routestype-system

Expected Benefit

Reproducible, typed evidence artifacts that external reviewers can inspect without running the app.

Possible Failure Modes

Schema drift between persisted manifests and current types.
Serialization gaps for optional fields.

Prediction Before Action

Confidence82%

Files Touched3

Dep Zones2

Tests Likely Fail0

Buildrebuild-required

Lintnone

Runtime Risknegligible

Uncertainty

Low–medium — additive surface with a well-understood type boundary.

Recommended Verification

npm run type-checknpm run build

Human Approvalrequired

Prediction Before Action — a deterministic mock estimate shown prior to any change. CodeWorld is not a trained world model; this illustrates simulate-before-write discipline, not a learned prediction.

Observed Reality

Files Touched3

Tests Failed0

Consoleno console errors

BuildpassLintpassRuntimeclean

Deployment

not deployed (simulation only)

Notes

Prediction matched observation. Additive change, no surprises.

Prediction-vs-Reality LedgerEvidence Preserved

Add a typed evidence ledger export

evidence · sim-ev-typed-evidence-export

Prediction Qualityaccurate

Dimension	Prediction Before Action	Observed Reality	Mismatch Surface
Files touched	3	3	alignedScope predicted exactly.
Test impact	0 likely to fail	0 failed	alignedTest impact predicted exactly.
Risk / runtime	negligible risk	clean runtime	alignedRuntime behaviour matched the risk posture.
Build impact	rebuild-required	pass	alignedBuild outcome consistent with prediction.
Lint impact	none	pass	alignedLint outcome consistent with prediction.

Unresolved Uncertainty

Low–medium — additive surface with a well-understood type boundary.

Human Review Statereviewed · approved · not executed

PostureAdvisory Only

Repository MutationNone

Evidence preserved · advisory only · no repository mutation. This ledger records a simulated prediction against a mock observed outcome. It does not write to any repository, run shell commands, or trigger a deployment.

Governance

Human authority layer

The simulator models consequences; it never bypasses review. Human approval sits above any real intervention.

Human Authority LayerAdvisory Only

Human review sits above automation

No Repository Mutation

The simulator does not write to a repository.

No shell commands are executed.

No production deployment is triggered.

No real Git, broker, or financial API is contacted.

Human approval is required before any real intervention.

The simulator exists to model consequences, not to bypass review. Human approval is the authority layer above every proposed intervention.

Direction

Repository-as-world thinking

Why a prediction-versus-reality ledger belongs at the centre of agentic software engineering.

Inspired by broader world-model research: before acting in a complex environment, a system should represent the world, imagine future states, estimate the cost of intervention, compare prediction with reality, and remain accountable when its prediction fails. In CodeWorld terms, the environment is the repository, future-state prediction is intervention-consequence prediction, and cost is files, dependency zones, tests, rollback risk, and uncertainty.