ARA — Agent Ready Armor
Ready Armor Suite · Battle Ready Armor (BRA) · Agent Ready Armor (ARA)
Follows the same Control in Depth philosophy.
Containment for what agents do. Provenance for what they produce.
A runtime substrate for AI agents. ARA sits between an agent and the world it operates in. It governs what the agent is permitted to do, narrows that surface for each task, and makes the agent's work auditable end to end. Operators stay in control without standing over the agent's shoulder.
Built for the demands of professional offensive security work — where the consequences of an unconstrained or unverifiable agent are immediate and external — and applicable, by design, to any agentic deployment that needs the same guarantees.
The two axes
Agents fail in two distinct ways. ARA defends each on its own track — neither is bolted on to a model, a prompt, or a wrapper script.
| What the agent does | What the agent claims | |
|---|---|---|
| The risk | Unauthorized action against systems, data, identity, or third parties | Output that looks correct but isn't grounded in real evidence |
| The control | Containment of capability | Provenance for every claim |
| The promise | The agent only does what policy allows | Every claim ties back to verifiable source |
The three-tier model
Most policy systems for agents conflate three different things: identity, engagement context, and the procedure being executed. ARA keeps them separate and composes them.
Armor
Who the agent is.
The operator's posture. What the agent will never do, regardless of task. Persists across every engagement.
Trim
Where, when, against whom.
The engagement overlay. Generated from intake metadata, applied live, expires with the engagement.
Weapon
What this run is doing.
The procedure. Forged once from source materials, sealed, replayable. Citations end-to-end — no fabrication, by construction.
The three layers compose deterministically. The agent's effective policy at any moment is the product of all three; no layer can soften another's denial.
Inside ARA — the named pieces
ARA is a small constellation of named components. Operators learn them in roughly this order. Hover or tap any card to flip it.
Armory
The web console where every engagement is monitored, approved, and exported. Live timelines, decision queues, drift signals, session diffs, signed exports, and a read-only viewer token for stakeholders who need visibility without capability. Everything updates live — no refresh, no polling.
Courtyard
Every contained agent's live state on one page — calls in flight, decisions resolving, approvals queued, drift signals lit. The operator's situational awareness surface. Drill into any session for its full audit chain.
Engagement
Canonical intake record + mats pool + applied trim + equipped weapon, all under one engagement id. Multi-trim, multi-weapon engagements share a single mats pool — the operator doesn't re-upload anything between runs. Preflight-gated; attestable on export.
Armor
Named loadouts the operator curates and switches between — each carries identity-deny rules, multiple stackable system-prompt fragments, posture defaults, even its own sigil and metadata. One active at a time. Says who the agent is.
Trim
Engagement-scoped overlay derived from intake. Scope, time window, time-of-day windows, posture, rate limits, stop conditions, declared authorizations. Marker-wrapped in policy so release restores the surrounding rules exactly as they were. Single-trim-per-armor enforced.
Bulwark
One canonical policy source every layer compiles down to. Hot-reloaded on change. Stackable scope expressions (global, per-armor, intersected, excluded), with timeout, hit-count, and exhaust-behavior modifiers, and a deterministic precedence walk no two operators will resolve differently.
Rhema
Mediation layer for the prompt and reply streams. Ask + reply rules attach blocks, rewrites, approval prompts, or shadow-test variants to anything model-bound or model-returned. Where automatic first-use tool classification happens — without a human in the loop unless the policy demands one.
Tools
Auditable record of every tool the agent has used. Keyed on tool plus flag pattern so the same utility shares a single entry across operating systems. Each carries a classification, confidence, the engagement that first saw it, status, and a review window. Operators override, reclassify, or sweep expired entries from the console.
TOD
Trims carry weekly windows in the operator's timezone. When the agent fires outside one, ARA holds the call and surfaces a four-way decision: extend, pivot to read-only, defer until window, cancel. Decisions are TTL'd, audited, and tag the findings produced under them.
Forge
The agent forges its own weapons — under the blacksmith armor + blacksmith-hammer hot-swapped in. Output is a typed, cross-citing, multi-layer weapon record derived against ten cognitive-science approaches. The substrate is its own first proof.
Weapons
The Forge's output. Hash-stamped, citation-bound, with a full-text search index built and calibration-tested per weapon. Equipped onto an engagement reversibly. Re-forge produces a typed delta the operator approves before the new version supersedes the old.
Mats
Class-partitioned source pool — separate spaces for the operator's source code, architecture documents, API specs, runbooks, build artifacts, and test credentials. Manifest tracks identity and origin for every file. Credentials get tighter access controls. Visibility to the agent is gated by the trim's posture; zero-copy linked into forge sessions.
Shadows
Run a policy variant alongside the live one without touching the agent's actual run. Side-by-side diffs of request and response, per-rule promotion when the operator wants a one-off rewrite to become standing policy, and an auto-promote readiness score on every variant rule.
Baseline
Snapshot one clean session — every path read, every command spawned, every host connected. Later runs diff against it; everything new shows up. Promote a baseline straight to an armor that allows what was seen and prompts on the rest.
Canaries
Realistic-looking but fake credentials seeded into paths an agent might read. The moment the agent echoes one back through an LLM reply — the strongest possible signal of exfiltration intent — a reply-layer rule fires and blocks the response. Templates ship for AWS, Stripe, GitHub, generic.
Audit
Every decision, every turn, every operator action lands in a tamper-evident ledger. Cryptographically chained so a verifier can detect any after-the-fact edit. Durably appended so a power loss can't strand a write. Sessions export with a detached signature and embedded attestation.
— hover to flip · click to spotlight
Forge — where weapons come from
Most agent frameworks let the model decide its own playbook on the fly. The agent reads the docs, draws inferences, and picks its next step. That works until it doesn't — until the agent invents a step that wasn't actually documented, cites a procedure that doesn't exist, or confidently extrapolates past what the source material supports.
ARA inverts this. The agent doesn't compose its playbook. It executes one that was authored, validated, and sealed before the engagement started. The Forge is what produces those playbooks — operationally distinctive and intellectually unique.
The agent forges the agent's own weapons
This is the proof of the substrate. When the operator opens a forge session, ARA hot-swaps in a special posture — the blacksmith armor + the blacksmith-hammer weapon — and the agent itself runs the authoring, under full containment. Tool surface narrowed to a small allowlist of text-extraction utilities; outputs constrained to a typed set of extraction channels. No shells, no spawning, dangerous filesystem paths denied, credential env vars scrubbed. Source materials never leave the contained workspace. The blacksmith is itself an ARA armor; the hammer is itself a sealed weapon.
ARA's substrate is its own first proof — the same primitives that govern an engagement also govern the work that produced the engagement's procedure.
A multi-level schema as output
What emerges from a forge session isn't a flat playbook. It's a structured weapon record with several typed layers, each derived from operator-deposited materials and cross-citing each other. Some layers carry the vocabulary the weapon operates over; some carry the procedure the agent will traverse; some carry the ground truth the engagement starts with; some carry the resources the agent draws on; some carry the guardrails the agent never crosses; one carries the calibrated voice in which findings are written.
Every layer cites the source it was derived from. Every finding the agent eventually ships traces back through this lattice to a specific location in a specific source document. There is no "emergent" claim — only retrieved-and-explained.
Designed against ten cognitive-science approaches
The Forge's schema isn't an arbitrary engineering choice. It's the output of a structured review where every published agent-memory mechanism we could find was mapped onto an ARA primitive and either approved, deferred, or skipped on its merits. Ten of them ship as foundational in v1; several more have their infrastructure prepared for post-v1 promotion.
Each one produces a structural property of the sealed weapon that no model could fabricate at runtime — long-horizon retrieval, anchored ground truth, encoded vocabularies, retrievability decay, predictive expectation, cross-modal channel discipline, use-history priors, selective retention across versions.
What ships with the Forge itself
Multi-format ingestion, no pre-processing
Source materials go in as the operator already has them — text files, logs, PDFs, CSVs, spreadsheets (XLSX / XLS), Word documents, HTML, even screenshots and scanned images that get OCR'd inline. No reformatting, no flattening — the Forge does the work.
Multi-stage authoring
Whatever lands gets run through a structured ingestion and authoring pipeline and emerges as a sealed, typed procedure graph: every step has a name, a precondition, a tool surface, exit conditions, and a citation back to a specific location in the original source — including the page and region of an image, not a flattened transcript of it.
End-to-end provenance
A finding the agent ships traces to a step in the procedure, which traces to a paragraph in the source. There is no claim without a chain. Provenance isn't a logging feature — it's an authoring-time invariant the Forge enforces structurally.
Calibration-gated seal
The Forge can't seal a weapon whose retrieval doesn't pass an operator-authored test. The operator writes a query set with expected results; before seal, the weapon's recall is measured against it. If accuracy falls below the operator's threshold, the seal blocks. Most knowledge bases hope retrieval works — the Forge proves it.
Conflict-aware merging
When two source documents disagree on the same data point, the Forge surfaces the conflict for operator resolution rather than silently picking one. No precedence rule quietly deciding what an engagement believes.
Replicable, swappable extraction
The default extractors run deterministically — the same materials yield the same output on every run. No run-to-run drift, no vendor lock-in, no surprise behavior when a model version changes overnight. Operators who want model-assisted extraction can swap in an alternate under the same interface; results stay comparable across both. The substrate isn't hostage to any one model.
Knowledge-gap awareness
The Forge knows what facts an engagement will need to populate before each step is meaningful. Steps whose preconditions are already known auto-skip; steps whose preconditions arrive mid-engagement fire automatically when their facts land. Wired in at forge time, not bolted on at runtime.
Determinism by construction
A sealed weapon is replay-safe. Same source materials plus same engagement produce an equivalent procedure trace, every time. No "the model was creative today." If something changed in the output, something changed in the inputs — and the Forge can tell you which.
Reforge as typed delta
Re-forging from a predecessor weapon produces a structured, typed proposal of every addition, removal, and modification — which the operator reviews and approves before the new weapon is sealed. Every change has a name, a citation, and an approver. Weapon evolution is auditable.
Hypothesis ledger
Open questions and proposed tests carry computed priority and a decaying retrievability score. A closure gate prevents the engagement from closing while critical hypotheses are still stale — the agent cannot pretend the work is done when it isn't.
The agent gets a procedure that's known to be exhaustive against the source material, known to be cited end to end, and known to be the same thing it was the last time. The operator gets output they can hand to a client with confidence.
How operators use ARA
A typical engagement walks through four operator touchpoints.
- Curate an armor. Once. The armor encodes the operator's posture — what's never permitted regardless of task. Most operators maintain one or two and reuse them.
- Open an engagement. Intake metadata in, target / scope / time window / rules of engagement out. The Trim Wizard derives the engagement-scoped overlay; the operator reviews and applies.
- Forge or equip a weapon. If the engagement needs a procedure that doesn't exist yet, drop the source materials in and open a forge session — ARA hot-swaps in the blacksmith armor + blacksmith-hammer and the agent itself does the authoring, under containment. If a sealed weapon already fits, equip it.
- Run preflight, then run the agent. Preflight validates the engagement state across roughly a dozen axes. If preflight is green, the agent runs. The Armory shows everything live. At the end, the operator exports a signed session package.
Why offensive security is the flagship
Pen testing is the hardest case for agentic AI today, which makes it the right shape for proving the substrate works:
- The cost of an agent doing the wrong thing is immediate and external — disruption to a target, a contract violation, an unauthorized probe of an out-of-scope asset.
- Authorization isn't binary; it's scoped, time-bounded, and conditional. Real engagements have rules of engagement. Most agent frameworks don't model them.
- Findings ship to clients. Hallucinated findings ship to clients too, unless something stops them. The cost of a fabricated vulnerability report is reputational damage that no amount of "the model's getting better" can recover.
- Engagements are ephemeral, parameterized, and compositional — exactly the surface a policy substrate has to express to be useful.
If ARA can make agents safe and accountable in offensive security, the same substrate covers softer use cases without modification.
Beyond offensive security
The same primitives generalize cleanly:
Customer-facing assistants
Armor encodes brand and safety posture. Trims encode per-customer scope and SLA. Weapons encode approved task playbooks. Audit ships to compliance.
Internal data platforms
Armor restricts the agent to permitted data domains. Trims represent per-query authorization. Weapons enforce the analytic playbook the agent must follow.
Regulated research
Provenance for every cited claim. Rate limits to satisfy upstream API agreements. Preflight to assert authorization is in place before any external request.
Long-running operations
Time-of-day enforcement for change windows. Stop conditions for budget caps. A first-use registry that lets operators sleep while the agent operates within rails they pre-approved.
ARA is not a wrapper for one runtime, one model, or one provider — it is a layer the runtime runs inside.
ARA is self-hosted, single-operator-friendly, and licensed for internal use under selective preview. We are open to investment and partnership opportunities.