The architecture for running AI agents in production.
A vendor-neutral catalog of nine named layers. Plug in any outside application through governed boundaries, and run the agents you build yourself on a runtime you own.
Open at the edges, sovereign at the core
less token usage when tools are discovered progressively instead of dumped into context
tokens a naive MCP server burns before the agent does any real work
jailbreak success against prompt-only defenses, the case for deterministic governance
improvement from a scoped tool interface over a raw shell, same model and task
Sources: Speakeasy and Kruczek (token reduction), JailbreakBench and Andriushchenko et al. (jailbreak rates), SWE-agent at Princeton NLP (tool interface). All cited in the specs.
What the modern data platform did for analytics, the Spine does for agents.
Big data got serious when it got an architecture. The warehouse, the data lake, and the lakehouse gave the enterprise one governed place for ingestion, cataloging, lineage, and access control, so analytics could run at petabyte scale without chaos. AI agents are at that same moment now, and the Spine is the enterprise-grade answer: nine versioned, citable specifications that give agents the same rigor. Discovery, coordination, governance, grounded data, an audited registry, and a runtime you own.
And it runs on top of the data estate you already have. The Spine's grounded-data layer plugs agents directly into your existing platforms, from the cloud warehouse to the Spark lakehouse, governed by your canonical definitions and your row and column entitlements. Your big-data investment becomes the trusted foundation your agents reason over, not a surface they can leak.
The data platforms the Spine grounds your agents in.
You are not the only one who needs this. The whole industry is building it.
Every layer of the Spine already exists, piecemeal, inside closed commercial platforms. UiPath, Palantir, AWS, Microsoft, Salesforce, Bloomberg, and more each shipped their own version of these concerns, independently, because the concerns are real. The same map, this time each station shows who already built it in their own closed model. The Spine names the pattern they all converged on, vendor-neutral and yours to own.
Two very different vendors, UiPath (one RPA platform) and Microsoft (a whole cloud stack), each independently built pieces of roughly six of the nine layers. Powerful proof the layers are real, and a clear picture of what locking into any single closed platform would cost you.
Surfaces the 5 to 8 tools an agent needs on demand, instead of dumping a thousand into the context window. Semantic entities, a gateway, SLA-aware routing.
Planner, generator, and evaluator are structurally separated, so the checker cannot simply agree with the maker. Coordination that catches its own mistakes.
Every external signal (markets, logistics, geopolitics, supplier health) arrives typed and provenance-stamped, so the reasoning that used it is auditable.
Composite scoring with confidence bands, tenant-conditioned weights, and signal-version provenance. Not one mystery number.
Every action passes deterministic policy before it reaches the wire. Actions the policy denies are structurally impossible, not merely unlikely. Identity per agent, audit by construction.
The temporal substrate. Project state, memory, and a verification-gated record of done that survive the context-window boundary, so the next session picks up the thread without loss.
The grounding substrate. A canonical semantic model (text to metric, not text to SQL) plus data-level entitlements, so answers are consistent and an agent sees only what its user may see.
The system of record layer. One continuously-reconciled catalog of every agentic asset, that discovery reads from and governance enforces against. Shadow assets become detectable, not invisible.
The execution substrate. A sovereign, first-party runtime where agents are identity-bound, isolated, ephemeral, and bounded by construction, composing the whole catalog. A specification you own, portable across any substrate.
Click any layer to open it. Five of the nine are public open source; four are private.
Not a vibe. A bar you can measure against.
Every layer ships with target SLAs. These are the production thresholds the spec holds you to, the line between "we have agents" and "we run agents in production." One headline target per layer below; the full tables live in each spec.
actions ever executed without passing policy first
agent queries that hit raw data tables directly
unregistered shadow assets reachable in production
unbounded or unattributed execution incidents
false 'task complete' declarations across sessions
decisions made on expired external signals
tools loaded into context by default, out of 200+ available
scores that declare their method, no mystery numbers
first-pass rejection rate of a real adversarial evaluator
Notice how many targets are zero. In production these are not aspirations, they are invariants the architecture has to make structurally true.
When something breaks, you know who dropped the ball.
The catalog turns "the AI broke" into a specific, ownable layer. Click a failure to light up the layer that owns it.
Two doors.
There are exactly two ways anything reaches your agent estate, and the Spine governs both.
Outside applications plug into the Spine
Any third-party or closed-source agent, tool, or AI application connects through governed boundaries: discovered through one curated surface, every action policy-gated and audited, the data grounded and entitlement-scoped, tracked in one system of record. Best of breed, no lock-in.
Your own agents run on the Spine
For the agents you build yourself, the Sovereign Runtime Spine is the execution model: identity-bound, isolated, bounded by construction, composing the whole catalog. A runtime you own and run on infrastructure you control, portable across any substrate.
The catalog lives on GitHub.
PDS, ACS, ESF, AGS, and DCS are public open source under CC BY 4.0 + MIT. CRI, GDS, ARS, and SRS are held private.
Read PDS on GitHub