From First Slice to Full Pressure Test: Raising the Readiness Bar Across Real Repos
A narrow readiness slice can validate structure, but full trust comes from pressure-testing execution claims, mode symmetry, and agent safety across real repositories.
Overview
Most readiness rollouts start with a narrow, useful slice. Ours did too:
- model one honest contributor path
- validate the contract shape
- add a CI matrix lane
- prove one green run
That is the right start. It is not maturity.
A contract can be structurally valid and still fail where real users and agents operate: task mode branches, cross-OS behavior, runtime probes, transitive task effects, and workflow-level proof semantics.
So we made a hard call: we stopped treating “one green workflow” as proof and moved to full pressure testing across real repositories.
Why The Initial Slice Was Not Enough
The first slice answers a useful question:
Can this repo express one runnable path?
But pressure testing answers the one that actually matters:
Are the declared paths truthful, executable, and safe under real execution conditions?
In practice, the real gaps only surfaced when we executed broader lanes:
- non-internal task dry-runs by mode
- native vs container path symmetry
- cross-OS workflow behavior
- safe-task boundary enforcement through transitive dependencies
- runtime proof for long-running workflows
Without that expansion, we would have shipped false confidence.
What We Expanded
We raised the bar in five concrete ways, and we enforced them as merge criteria.
1) From workflow-only checks to task-surface checks
We stopped treating workflow success as full proof. If the contract declares runnable tasks, those tasks must be exercised directly too.
ota tasks --useota run <task> --dry-run2) From single-mode assumptions to explicit mode truth
If a task claims native and container support, both paths must be validated explicitly. No exceptions.
ota run <task> --mode native --dry-runota run <task> --mode container --dry-run3) From Linux-only confidence to cross-OS confidence
A path that works on one host can still fail on another due to runtime/tooling surfaces, shell behavior, or unsupported-host semantics. Matrix coverage needs explicit host intent, including unsupported-host proof when that is the correct contract behavior.
4) From direct task safety to transitive task safety
Agent-safe declarations are only reliable if reachable dependencies are also safe with respect to writable/protected boundaries and declared task effects. We now treat transitive safety as mandatory, not optional.
5) From “green enough” to classification discipline
Every failure should be classified as one of:
- Ota bug
- Ota maturity gap
- contract issue
- repo baseline issue
- CI environment limitation
- intentional unsupported path
That prevents pressure tests from devolving into local workarounds and keeps the feedback loop product-grade.
What Changed In Practice
Extending from first slice to full pressure testing changed outcomes immediately:
- contracts became more scope-honest
- mode claims became explicit and testable
- CI lanes became more informative, less noisy
- agent boundaries became stricter and less ambiguous
- Ota product gaps surfaced early enough to fix before PR review
The key lesson is simple and opinionated:
A valid contract is a starting condition. A pressure-tested contract is an operational asset.
Suggested Pressure-Test Baseline
For teams adopting Ota now, this is the baseline I recommend for strong signal without unnecessary complexity:
ota validateota doctorota tasks --useota tasks --safe --useota execution topology --jsonThen add matrix lanes that prove:
- native dry-run coverage for declared runnable tasks
- container dry-run coverage where mode support is declared
- runtime proof for workflows that claim live readiness surfaces
- explicit unsupported-host checks where host constraints are intentional
This is the minimum bar that turned “works in one path” into “trustworthy across real usage” for us.
Take action