ARA — Agent Ready Armor

Ready Armor Suite · Battle Ready Armor (BRA) · Agent Ready Armor (ARA)

Follows the same Control in Depth philosophy.

Containment for what agents do. Provenance for what they produce.

A runtime substrate for AI agents. ARA sits between an agent and the world it operates in. It governs what the agent is permitted to do, narrows that surface for each task, and makes the agent's work auditable end to end. Operators stay in control without standing over the agent's shoulder.

Built for the demands of professional offensive security work — where the consequences of an unconstrained or unverifiable agent are immediate and external — and applicable, by design, to any agentic deployment that needs the same guarantees.

The two axes

Agents fail in two distinct ways. ARA defends each on its own track — neither is bolted on to a model, a prompt, or a wrapper script.

  What the agent does What the agent claims
The risk Unauthorized action against systems, data, identity, or third parties Output that looks correct but isn't grounded in real evidence
The control Containment of capability Provenance for every claim
The promise The agent only does what policy allows Every claim ties back to verifiable source

The three-tier model

Most policy systems for agents conflate three different things: identity, engagement context, and the procedure being executed. ARA keeps them separate and composes them.

Persistent · operator-curated

Armor

Who the agent is.

The operator's posture. What the agent will never do, regardless of task. Persists across every engagement.

hotswappable · exportable · tradable · mechanically enforced
Engagement · TTL'd

Trim

Where, when, against whom.

The engagement overlay. Generated from intake metadata, applied live, expires with the engagement.

hotswappable · exportable · tradable · mechanically enforced
Forged · sealed

Weapon

What this run is doing.

The procedure. Forged once from source materials, sealed, replayable. Citations end-to-end — no fabrication, by construction.

hotswappable · exportable · tradable · mechanically enforced

The three layers compose deterministically. The agent's effective policy at any moment is the product of all three; no layer can soften another's denial.

Inside ARA — the named pieces

ARA is a small constellation of named components. Operators learn them in roughly this order. Hover or tap any card to flip it.

Live
Armory
Where operators steer everything
Live

Armory

The web console where every engagement is monitored, approved, and exported. Live timelines, decision queues, drift signals, session diffs, signed exports, and a read-only viewer token for stakeholders who need visibility without capability. Everything updates live — no refresh, no polling.

two-token auth · always-live
Live
Courtyard
What's happening right now
Live

Courtyard

Every contained agent's live state on one page — calls in flight, decisions resolving, approvals queued, drift signals lit. The operator's situational awareness surface. Drill into any session for its full audit chain.

activity · approvals · drift
Live
Engagement
The bounded unit of work
Live

Engagement

Canonical intake record + mats pool + applied trim + equipped weapon, all under one engagement id. Multi-trim, multi-weapon engagements share a single mats pool — the operator doesn't re-upload anything between runs. Preflight-gated; attestable on export.

canonical record · preflight-gated
Policy
Armor
Persistent identity
Policy

Armor

Named loadouts the operator curates and switches between — each carries identity-deny rules, multiple stackable system-prompt fragments, posture defaults, even its own sigil and metadata. One active at a time. Says who the agent is.

multi-absolute · cross-armor copy
Policy
Trim
Engagement overlay
Policy

Trim

Engagement-scoped overlay derived from intake. Scope, time window, time-of-day windows, posture, rate limits, stop conditions, declared authorizations. Marker-wrapped in policy so release restores the surrounding rules exactly as they were. Single-trim-per-armor enforced.

trim wizard · twelve-axis preflight
Policy
Bulwark
The unified substrate
Policy

Bulwark

One canonical policy source every layer compiles down to. Hot-reloaded on change. Stackable scope expressions (global, per-armor, intersected, excluded), with timeout, hit-count, and exhaust-behavior modifiers, and a deterministic precedence walk no two operators will resolve differently.

scoped · timed · counted · deterministic
Policy
Rhema
LLM mediation
Policy

Rhema

Mediation layer for the prompt and reply streams. Ask + reply rules attach blocks, rewrites, approval prompts, or shadow-test variants to anything model-bound or model-returned. Where automatic first-use tool classification happens — without a human in the loop unless the policy demands one.

block · rewrite · approve · shadow-test
Policy
Tools
Classification registry
Policy

Tools

Auditable record of every tool the agent has used. Keyed on tool plus flag pattern so the same utility shares a single entry across operating systems. Each carries a classification, confidence, the engagement that first saw it, status, and a review window. Operators override, reclassify, or sweep expired entries from the console.

cross-OS · status lifecycle · TTL'd
Policy
TOD
Time-of-day enforcement
Policy

TOD

Trims carry weekly windows in the operator's timezone. When the agent fires outside one, ARA holds the call and surfaces a four-way decision: extend, pivot to read-only, defer until window, cancel. Decisions are TTL'd, audited, and tag the findings produced under them.

four-way modal · tagged findings
Forge
Forge
Authoring pipeline
Forge

Forge

The agent forges its own weapons — under the blacksmith armor + blacksmith-hammer hot-swapped in. Output is a typed, cross-citing, multi-layer weapon record derived against ten cognitive-science approaches. The substrate is its own first proof.

10 cogsci approaches · agent-run · sealed
Forge
Weapons
Sealed procedures
Forge

Weapons

The Forge's output. Hash-stamped, citation-bound, with a full-text search index built and calibration-tested per weapon. Equipped onto an engagement reversibly. Re-forge produces a typed delta the operator approves before the new version supersedes the old.

per-weapon FTS · typed delta on reforge
Forge
Mats
Source materials
Forge

Mats

Class-partitioned source pool — separate spaces for the operator's source code, architecture documents, API specs, runbooks, build artifacts, and test credentials. Manifest tracks identity and origin for every file. Credentials get tighter access controls. Visibility to the agent is gated by the trim's posture; zero-copy linked into forge sessions.

manifested · posture-gated · zero-copy
Evidence
Shadows
Policy A/B testing
Evidence

Shadows

Run a policy variant alongside the live one without touching the agent's actual run. Side-by-side diffs of request and response, per-rule promotion when the operator wants a one-off rewrite to become standing policy, and an auto-promote readiness score on every variant rule.

≥85% similarity · ≥5 pairs to promote
Evidence
Baseline
"What good looked like"
Evidence

Baseline

Snapshot one clean session — every path read, every command spawned, every host connected. Later runs diff against it; everything new shows up. Promote a baseline straight to an armor that allows what was seen and prompts on the rest.

snapshot → diff → armor
Evidence
Canaries
Honeytokens for exfil
Evidence

Canaries

Realistic-looking but fake credentials seeded into paths an agent might read. The moment the agent echoes one back through an LLM reply — the strongest possible signal of exfiltration intent — a reply-layer rule fires and blocks the response. Templates ship for AWS, Stripe, GitHub, generic.

high-entropy · per-kind tagged
Evidence
Audit
Hash-chained ledger
Evidence

Audit

Every decision, every turn, every operator action lands in a tamper-evident ledger. Cryptographically chained so a verifier can detect any after-the-fact edit. Durably appended so a power loss can't strand a write. Sessions export with a detached signature and embedded attestation.

tamper-evident · durable · signed export

— hover to flip · click to spotlight

Forge — where weapons come from

Most agent frameworks let the model decide its own playbook on the fly. The agent reads the docs, draws inferences, and picks its next step. That works until it doesn't — until the agent invents a step that wasn't actually documented, cites a procedure that doesn't exist, or confidently extrapolates past what the source material supports.

ARA inverts this. The agent doesn't compose its playbook. It executes one that was authored, validated, and sealed before the engagement started. The Forge is what produces those playbooks — operationally distinctive and intellectually unique.

The agent forges the agent's own weapons

This is the proof of the substrate. When the operator opens a forge session, ARA hot-swaps in a special posture — the blacksmith armor + the blacksmith-hammer weapon — and the agent itself runs the authoring, under full containment. Tool surface narrowed to a small allowlist of text-extraction utilities; outputs constrained to a typed set of extraction channels. No shells, no spawning, dangerous filesystem paths denied, credential env vars scrubbed. Source materials never leave the contained workspace. The blacksmith is itself an ARA armor; the hammer is itself a sealed weapon.

ARA's substrate is its own first proof — the same primitives that govern an engagement also govern the work that produced the engagement's procedure.

A multi-level schema as output

What emerges from a forge session isn't a flat playbook. It's a structured weapon record with several typed layers, each derived from operator-deposited materials and cross-citing each other. Some layers carry the vocabulary the weapon operates over; some carry the procedure the agent will traverse; some carry the ground truth the engagement starts with; some carry the resources the agent draws on; some carry the guardrails the agent never crosses; one carries the calibrated voice in which findings are written.

Every layer cites the source it was derived from. Every finding the agent eventually ships traces back through this lattice to a specific location in a specific source document. There is no "emergent" claim — only retrieved-and-explained.

Designed against ten cognitive-science approaches

The Forge's schema isn't an arbitrary engineering choice. It's the output of a structured review where every published agent-memory mechanism we could find was mapped onto an ARA primitive and either approved, deferred, or skipped on its merits. Ten of them ship as foundational in v1; several more have their infrastructure prepared for post-v1 promotion.

Each one produces a structural property of the sealed weapon that no model could fabricate at runtime — long-horizon retrieval, anchored ground truth, encoded vocabularies, retrievability decay, predictive expectation, cross-modal channel discipline, use-history priors, selective retention across versions.

What ships with the Forge itself

Multi-format ingestion, no pre-processing

Source materials go in as the operator already has them — text files, logs, PDFs, CSVs, spreadsheets (XLSX / XLS), Word documents, HTML, even screenshots and scanned images that get OCR'd inline. No reformatting, no flattening — the Forge does the work.

Multi-stage authoring

Whatever lands gets run through a structured ingestion and authoring pipeline and emerges as a sealed, typed procedure graph: every step has a name, a precondition, a tool surface, exit conditions, and a citation back to a specific location in the original source — including the page and region of an image, not a flattened transcript of it.

End-to-end provenance

A finding the agent ships traces to a step in the procedure, which traces to a paragraph in the source. There is no claim without a chain. Provenance isn't a logging feature — it's an authoring-time invariant the Forge enforces structurally.

Calibration-gated seal

The Forge can't seal a weapon whose retrieval doesn't pass an operator-authored test. The operator writes a query set with expected results; before seal, the weapon's recall is measured against it. If accuracy falls below the operator's threshold, the seal blocks. Most knowledge bases hope retrieval works — the Forge proves it.

Conflict-aware merging

When two source documents disagree on the same data point, the Forge surfaces the conflict for operator resolution rather than silently picking one. No precedence rule quietly deciding what an engagement believes.

Replicable, swappable extraction

The default extractors run deterministically — the same materials yield the same output on every run. No run-to-run drift, no vendor lock-in, no surprise behavior when a model version changes overnight. Operators who want model-assisted extraction can swap in an alternate under the same interface; results stay comparable across both. The substrate isn't hostage to any one model.

Knowledge-gap awareness

The Forge knows what facts an engagement will need to populate before each step is meaningful. Steps whose preconditions are already known auto-skip; steps whose preconditions arrive mid-engagement fire automatically when their facts land. Wired in at forge time, not bolted on at runtime.

Determinism by construction

A sealed weapon is replay-safe. Same source materials plus same engagement produce an equivalent procedure trace, every time. No "the model was creative today." If something changed in the output, something changed in the inputs — and the Forge can tell you which.

Reforge as typed delta

Re-forging from a predecessor weapon produces a structured, typed proposal of every addition, removal, and modification — which the operator reviews and approves before the new weapon is sealed. Every change has a name, a citation, and an approver. Weapon evolution is auditable.

Hypothesis ledger

Open questions and proposed tests carry computed priority and a decaying retrievability score. A closure gate prevents the engagement from closing while critical hypotheses are still stale — the agent cannot pretend the work is done when it isn't.

The agent gets a procedure that's known to be exhaustive against the source material, known to be cited end to end, and known to be the same thing it was the last time. The operator gets output they can hand to a client with confidence.

How operators use ARA

A typical engagement walks through four operator touchpoints.

  1. Curate an armor. Once. The armor encodes the operator's posture — what's never permitted regardless of task. Most operators maintain one or two and reuse them.
  2. Open an engagement. Intake metadata in, target / scope / time window / rules of engagement out. The Trim Wizard derives the engagement-scoped overlay; the operator reviews and applies.
  3. Forge or equip a weapon. If the engagement needs a procedure that doesn't exist yet, drop the source materials in and open a forge session — ARA hot-swaps in the blacksmith armor + blacksmith-hammer and the agent itself does the authoring, under containment. If a sealed weapon already fits, equip it.
  4. Run preflight, then run the agent. Preflight validates the engagement state across roughly a dozen axes. If preflight is green, the agent runs. The Armory shows everything live. At the end, the operator exports a signed session package.

Why offensive security is the flagship

Pen testing is the hardest case for agentic AI today, which makes it the right shape for proving the substrate works:

If ARA can make agents safe and accountable in offensive security, the same substrate covers softer use cases without modification.

Beyond offensive security

The same primitives generalize cleanly:

Customer-facing assistants

Armor encodes brand and safety posture. Trims encode per-customer scope and SLA. Weapons encode approved task playbooks. Audit ships to compliance.

Internal data platforms

Armor restricts the agent to permitted data domains. Trims represent per-query authorization. Weapons enforce the analytic playbook the agent must follow.

Regulated research

Provenance for every cited claim. Rate limits to satisfy upstream API agreements. Preflight to assert authorization is in place before any external request.

Long-running operations

Time-of-day enforcement for change windows. Stop conditions for budget caps. A first-use registry that lets operators sleep while the agent operates within rails they pre-approved.

ARA is not a wrapper for one runtime, one model, or one provider — it is a layer the runtime runs inside.

ARA is self-hosted, single-operator-friendly, and licensed for internal use under selective preview. We are open to investment and partnership opportunities.