01 · The premise
If an agent in production can't be debugged like a microservice, it isn't in production. It's a demo that lives on a server.
Generative AI has become a system primitive. It's no longer a separate section of the architecture — it's a library you manage like you manage Postgres: with monitoring, with SLOs, with rollback, with governance.
Our approach applies XP to the AI domain: small prompt releases, eval suites as tests, continuous refactor of the prompt catalog, continuous integration of evaluations in CI.
And we apply Extreme Contracts: every AI capability has declared pre-conditions (input shape, data safety), verifiable post-conditions (eval gates, latency budget, accuracy floor) and explicit fallbacks for when the model fails.
04 · The contract
Pre-conditions, post-conditions, invariants.
Every engagement has explicit pre-conditions, measurable post-conditions, and invariants we never violate. You know what we need at the start, what comes out at the end, and what we don't negotiate in the middle.
Pre-conditions / what we need from you
- Validated use case: a real end user who will use the system, not a CMO experiment.
- Access to domain data (with privacy/legal clearance) or a representative dataset.
- Declared inference budget: needed to size the architecture.
- Agreed error tolerance: what happens when the model is wrong? How much error is acceptable?
Post-conditions / what we guarantee
- AI system in production with eval gate in CI: no deploy without eval pass.
- Live dashboards for accuracy, latency, cost, safety.
- Operational runbook: how to handle performance drift, cost escalation, safety incidents.
- Prompt + eval + tooling stack versioned in the client's repository.