Policy Binding — pull, enforcement, and per-run refresh for every agent surface
Status: phase 1 implemented (with the revisions in the addendum below) Scope:HexgateAgent (create_agent / loaders) + the four adapters
(openai, google, pydantic_ai, langchain BYO-graph)
Date: 2026-06-04
Addendum — post-review simplifications (2026-06-04). Phase 1 landed a deliberately smallerPolicyBindingthan §3/§4 describe. Where this addendum and the body disagree, the addendum wins:Net:
- No auto-register in the binding.
AutoRegisterSpec,manifest_payload, and the 404→register→retry flow are gone (SRP: registration is not policy resolution — andhexgate.cli.registeralready builds better manifests from the real agent object). A 404 propagates asHexgateErrorwith.status == 404; callers that want register-on-miss catch it, callregister_agent(agent), and resolve again. Adapter phases add this 4-line pattern at the wrap sites.- No
fallbackparam. The plain constructor is the explicit static path:PolicyBinding(PolicyEnforcer(engine, agent_name=...)).- No
prefetchedparam. The loader composes directly: build the pre-seededPlatformPolicySource, decode the bundle (as it already does), and call the constructor. Same single round trip, no parameter.- No
client/agent_namefields. The binding holds exactlyenforcer+source;agent_nameis read from the enforcer for logs, and theHexgateClientstays a caller concern (agent.hexgate_client, as today).- No
_warn_uncovered_tools. YAGNI — revisit in the adapter phases if real usage shows the need.- The
HEXGATE_LOCAL_POLICYhelpers live insecurity/source.py(env var →PolicySourceis source-construction), not inbinding.py. The loader re-imports them from there.- The refresh lock lives in
PlatformPolicySource, which owns the mutable cache it protects — the binding has no lock.binding.pyis ~200 lines (resolve → local override → platform → raise; fail-softrefresh/refresh_async), and the client kept onlyHexgateError.statusfrom its planned changes.
1. Problem statement
Today only the loader path (load_hexgate_agent / load_agent with
HEXGATE_KEY, hexgate/agents/loader.py:528) delivers the full governance
loop: pull the signed policy bundle from the platform, verify it, enforce it
on every tool call, and re-pull (ETag/304) at the top of every run.
Every other construction path falls short:
| Surface | Pull | Enforce | Per-run refresh |
|---|---|---|---|
load_hexgate_agent / load_agent + HEXGATE_KEY | ✅ verified bundle | ✅ GuardedTool | ✅ ETag/304 |
create_agent(...) (programmatic) | ❌ | ❌ | no-op |
create_agent(...) + enforce_policy(p) | ❌ (static p) | ✅ | no-op (no source) |
wrap_openai_agent / HexgateRunner (openai) | ❌ placeholder allow-all | ✅ (allow-all) | ❌ |
wrap_google_agent / HexgateRunner (google) | ❌ placeholder allow-all | ✅ (allow-all) | ❌ |
wrap_langchain_agent (BYO CompiledStateGraph) | ❌ placeholder allow-all | ✅ (allow-all) | ❌ |
wrap_pydantic_agent | ❌ placeholder allow-all | ✅ (allow-all) | ❌ |
adapters/openai/wrapper.py:19, adapters/google/wrapper.py:17,
adapters/pydantic_ai/wrapper.py:22, adapters/langchain/wrapper.py:22)
api_key is collected — and then ignored. This spec replaces all four
placeholders and the create_agent gap with one framework-agnostic
primitive.
1.1 The structural fact this design exploits
Every enforcement mechanism in the codebase —GuardedTool
(adapters/langchain/tools.py:61), install_enforcer_on_tool
(adapters/langchain/tools.py:178), the OpenAI FunctionTool.on_invoke_tool
copy (adapters/openai/tools.py:30), the ADK tool wrappers, the pydantic_ai
toolset clone — closes over the PolicyEnforcer, never over the policy
itself. The enforcer is a one-field indirection
(hexgate/security/enforcer.py:18):
2. Goals / non-goals
Goals
- One implementation of policy resolution (precedence, verification, fallback) and refresh (ETag/304, fail-soft) shared by all six surfaces.
create_agentcan bind to the platform at creation; refresh already fires perstream_agent/invoke_agent— no streaming changes.- All four adapters pull a real policy at wrap time and refresh at every run entry point; the allow-all placeholder is deleted.
- Verification semantics are byte-identical to today’s loader: a tampered or unverifiable bundle is never silently downgraded.
load_hexgate_agentdedupes onto the same primitive (net code deletion).
Non-goals
- No new platform endpoints.
GET /v1/agents/{name}(ETag/304,platform/api/main.py:857) andPOST /v1/agents(auto-register with default role-aware policy + signed bundle,platform/api/main.py:829) already exist and suffice. - No per-tool-call refresh. Refresh granularity stays per run/turn
(matching
stream_agenttoday). Mid-turn policy edits land next turn. - No change to the enforcement decision pipeline
(
PolicyEnforcer.decide→ engineevaluate→Decision), nor to the WASM evaluation path (hexgate/security/wasm_engine.py). - No change to approval-handler semantics per adapter (LangChain resolves inline; OpenAI/Google/pydantic_ai render markered errors — unchanged).
3. Core design: PolicyBinding
New module: hexgate/security/binding.py — framework-agnostic, imports
nothing from hexgate.adapters.* or hexgate.agents.factory (avoids the
factory↔loader↔adapters import cycles).
3.1 PolicyBinding.resolve(...) — the pull
load_hexgate_agent
(loader.py:597-636); the loader’s helpers _local_policy_override,
_verify_local_source_signature_policy, _resolve_pubkey_for_verification,
_local_sign_callable move into this module (loader re-imports them, so
its behavior and its tests stay identical):
HEXGATE_LOCAL_POLICYoverride (dev loop) — wins outright.BundleDirPolicySource(mtime-refreshed pre-built bundle) orYamlPolicySource(auto-recompile on save), exactly as today (hexgate/security/source.py:165,256). The platform is not contacted.- Platform — when an
api_key/HEXGATE_KEY/clientis available:- Build or reuse the
HexgateClient(Biscuit signature verified lazily on first use,cloud/client.py:177). payload, etag = client.get_agent(agent_name)— or useprefetchedwhen the caller already fetched (loader path; avoids a double fetch).decode_and_verify_platform_bundle(payload, client.public_key_bytes())(source.py:117): Ed25519 signature over the exact manifest bytes andsha256(wasm) == manifest.wasm_hash. Any failure raises — identical to today.- Bundle present → it is the policy. Bundle absent (platform couldn’t
compile, e.g. no
opa) → fall back toload_policy_set_from_dict(payload["policy_yaml"])(pydantic engine), unlessHEXGATE_BUNDLE_REQUIRE_SIGNATUREis set → raise (today’s rule,loader.py:615-621). - Attach
PlatformPolicySource(client, agent_name, initial_bundle=..., initial_etag=...)pre-seeded so the first refresh is a 304 (source.py:78-93). - 404 — agent unknown to the platform:
- if
auto_registeris provided →POST /v1/agentswith a minimal manifest (name + tool names), which the platform answers with a default role-aware policy and a signed bundle (platform/api/main.py:829,services.py:1290); then re-fetch and proceed. Registration failure → raise. - else → raise
PolicyBindingErrorwith a message pointing athexgate agents register/auto_register=.
- if
- Build or reuse the
fallbackengine — used only when neither a local override nor any API key is in play.None(the default) → raise. This is the explicit opt-out that replaces the adapters’ silent allow-all; callers who truly want ungoverned behavior must write it down:fallback=PolicySet.allow_all(tool_names).
resolve(). Rationale: failures are loud at construction, and the enforcer
always holds a real, verified policy before any run. (The lazy alternative —
an unseeded source plus a “pending” policy — interacts badly with refresh’s
deliberate fail-soft: a network blip on first run would leave the agent
either unguarded or hard-bricked. Rejected.)
AutoRegisterSpec
policy_yaml
(platform guarantee, main.py:836-841), so auto-register is idempotent and
dashboard edits survive.
3.2 refresh() / refresh_async() — the per-run pull
HexgateAgent.refresh_policy + _refresh_policy_safely
(factory.py:434-460,550-566) with two deltas:
- fail-soft moves into the binding (one place instead of per-caller), and
- a
threading.Lockserializes concurrent refreshes. Adapters’ proxies are documented multi-user objects; without the lock, two concurrent runs both miss the ETag cache and double-fetch.enforcer.policy =itself is an atomic rebind andWasmPolicyalready serializes evaluation (wasm_engine.py:130-133), so the lock is purely an efficiency/etag- coherence measure, not a correctness one.
PlatformPolicySource.fetch() sends
If-None-Match: "<wasm_hash>"; the platform compares against
sha256(compiled_wasm) and answers 304 with no body
(platform/api/main.py:881-893); the source returns the same cached
PolicyBundle object; identity check short-circuits the swap. Cost per
unchanged run: one small HTTP round trip — no signature re-verify, no
wasmtime work.
On a 200, the source re-runs the full decode + signature + integrity
verification before caching (source.py:103-114). A bundle that fails
verification at refresh time raises inside source.fetch() → caught by
refresh() → logged → previous verified policy stays in force. A
tampered refresh can therefore deny service to new policy but can never
install itself.
3.3 Refresh granularity and timing contract
- Refresh fires once at the top of every run (turn), before any model or tool execution for that run. All surfaces below uphold this.
- Tool calls within a single run see one consistent policy: the swap only
happens at the run boundary; mid-run a concurrent runner’s swap can land
(shared enforcer), which is acceptable — both policies involved are
platform-verified, and per-tool-call atomicity is guaranteed by
decide()readingself.policyonce. - Sync entry points call
refresh()directly (blocking HTTP, ~ms, on the caller’s thread — same as any sync tool I/O). Async entry points callawait refresh_async().
4. Surface integration
4.1 HexgateAgent — create_agent (hexgate/agents/factory.py)
New keyword params on create_agent:
bind_policy:
| value | behavior |
|---|---|
None (default, auto) | bind iff (HEXGATE_KEY set or HEXGATE_LOCAL_POLICY set) and name is provided; otherwise return the bare agent exactly as today |
True | bind or raise (name required; no key and no local override → raise) |
False | today’s behavior, unconditionally (escape hatch for unit tests of bare graphs) |
HexgateAgent is constructed):
_refresh_policy_safely already fires at the top
of stream_agent / stream_agent_raw / invoke_agent
(factory.py:579,600), and with_tools already propagates the seam across
rebuilds (factory.py:365-379). HexgateAgent.refresh_policy becomes a
delegation to self._binding.refresh() when a binding is present, keeping
the legacy _enforcer/_policy_source attribute path working for code that
attached them by hand.
Auto-register default: on for the create_agent binding path (the agent
was authored in code; registering it is what makes the dashboard useful) —
mirrors the hexgate serve UX introduced in bd925a9/71586c2.
4.2 Loader dedupe (hexgate/agents/loader.py)
load_hexgate_agent (loader.py:528) keeps its public contract but its
policy-precedence block (loader.py:597-636) becomes:
prefetched matters: the loader needs the payload for agent_yaml /
system_md before the agent exists; passing it through keeps the load a
single round trip. ~40 duplicated lines deleted; the two paths can no longer
drift. load_builtin_agent / load_local_agent keep their current shape
(static policy from disk + optional local-override source) but route the
override resolution through the moved helpers.
Back-compat invariants: _enforcer and _policy_source remain readable
attributes (tests and with_tools propagation rely on them). They become
views over the binding (_enforcer ≡ binding.enforcer,
_policy_source ≡ binding.source).
4.3 LangChain BYO-graph (adapters/langchain/wrapper.py, agent.py)
wrap_langchain_agent (wrapper.py:34):
build_policy_set is deleted. HexgateLangchainAgent
(adapters/langchain/agent.py:15) stores the binding and refreshes at every
run boundary:
| method | refresh call |
|---|---|
ainvoke, astream, astream_events | await self._binding.refresh_async() (first line, before entering the User scope) |
invoke, stream | self._binding.refresh() (first line) |
enforcer.policy swaps.
4.4 OpenAI Agents (adapters/openai/wrapper.py, runner.py)
The OpenAIHexgateRunner receives the agent per call
(runner.py:59-125) and currently re-wraps it (and would re-resolve a fresh
binding) on every run — which would defeat the ETag cache. Spec:
HexgateRunnergains a binding cache:self._bindings: dict[str, PolicyBinding].- New private helper:
wrap_openai_agentchanges signature to accept the enforcer (or binding) instead of building its own placeholder:wrap_openai_agent(agent, enforcer=binding.enforcer)→ tool copies viawrap_tools(adapters/openai/tools.py:53), mechanics unchanged. Re- wrapping per call stays (it’s cheapcopy.copyofFunctionTools and the agent arrives per call anyway); the enforcer is the cached, shared object, so refresh reaches every copy.- Run methods:
run(async):binding = self._binding_for(agent)→await binding.refresh_async()→ wrap → run inside the User scope.run_sync: same withbinding.refresh().run_streamed: refresh synchronously in the setup body, beforeRunner.run_streamedis called. Tools execute later, duringstream_eventsiteration, but they hold the enforcer reference fixed at wrap time — and the policy they consult is whatever the enforcer holds when the tool actually fires. Refreshing at setup satisfies the “refresh before the run’s first tool call” contract. (A second refresh inside_stream_events_with_scopeis not added: one refresh per run.)
4.5 Google ADK (adapters/google/wrapper.py, runner.py)
The GoogleHexgateRunner wraps once at construction
(runner.py:42 — “Policy is baked at construction”) and reuses the built
ADK Runner. Spec:
__init__:self._binding = PolicyBinding.resolve(agent_name, api_key=self.api_key, auto_register=AutoRegisterSpec(tool_names));wrap_google_agent(agent, enforcer=self._binding.enforcer); ADKRunnerbuilt once, as today. Construction becomes the loud-failure point (network, signature, 404-without-register all raise here).run(sync generator):self._binding.refresh()first line, beforeuser.sync_scope().run_async(async generator):await self._binding.refresh_async()first line.
Runner and the model_copy’d agent never change; the
enforcer swap is invisible to ADK.
4.6 pydantic_ai (adapters/pydantic_ai/wrapper.py, agent.py)
Same pattern as LangChain:wrap_pydantic_agentresolves the binding (auto-register from the extracted tool names), wraps the cloned toolset withbinding.enforcer, deletes itsbuild_policy_set, and passes the binding intoHexgatePydanticAgent.HexgatePydanticAgentrun methods:await self._binding.refresh_async()(async) /self._binding.refresh()(sync) as their first line.
5. Failure semantics — one matrix, all surfaces
The governing rule, inherited from the loader and made universal: construction is fail-loud, refresh is fail-soft (to staleness only).| Event | When | Behavior |
|---|---|---|
HEXGATE_KEY malformed / Biscuit signature invalid | resolve | raise (HexgateError) |
| Platform unreachable | resolve | raise — caller decides; never run on a policy we never had |
| Platform unreachable | refresh | warn + keep previous verified policy (unbounded staleness — see §8.3) |
| Bundle signature/integrity fails | resolve | raise; never downgrade to pydantic engine |
| Bundle signature/integrity fails | refresh | raise inside source.fetch() → caught → warn + keep previous policy (tamper cannot install itself) |
Platform serves no bundle (no opa on control plane) | resolve | pydantic engine on policy_yaml; raise if HEXGATE_BUNDLE_REQUIRE_SIGNATURE |
| Agent 404 on platform | resolve | auto-register (when spec provided) → re-fetch; else raise PolicyBindingError |
| Agent 404 | refresh | treated as fetch failure → warn + keep previous (agent deleted mid-session keeps last policy; see §8.3) |
No key, no local override, no fallback | resolve | raise — replaces the adapters’ silent allow-all |
HEXGATE_LOCAL_POLICY set but broken (bad yaml, missing opa, bad sig) | resolve & refresh | raise loudly (today’s rule — silently degrading a security override defeats it, loader.py:206-209) |
| Tool name in wrapped agent absent from fetched policy | resolve | logger.warning listing uncovered tools (they will be denied-by-absence at call time) |
6. Security invariants (must hold after the refactor)
- Single trust root. The platform’s Ed25519 root key signs both
Biscuits (
HEXGATE_KEY) and bundle manifests; the SDK verifies both against the same key (explicit →HEXGATE_PUBLIC_KEY→ JWKS TOFU). - No silent downgrade. A served-but-unverifiable bundle is fatal at
resolve and inert at refresh. The pydantic fallback is only reachable
when the platform affirmatively served no bundle, and
HEXGATE_BUNDLE_REQUIRE_SIGNATUREcloses even that. - No silent allow-all. Removing
build_policy_setremoves the last construction path that runs ungoverned with aHEXGATE_KEYpresent. Ungoverned operation requires an explicitfallback=orbind_policy=Falsein the caller’s code. - Verification happens before caching.
PlatformPolicySourceonly caches bundles that passed signature + integrity; the 304 path can only return previously verified objects. - Role resolution stays call-time.
PolicyEnforcer.decidere-reads theUsercontextvar per tool call (enforcer.py:42-44); binding/refresh is user-agnostic. One binding safely serves many concurrent users. - Refresh can deny freshness, never grant access. The worst a compromised refresh channel can do is keep an old (verified) policy in force.
7. Platform contract (existing, consumed as-is)
| Endpoint | Use | Notes |
|---|---|---|
GET /v1/agents/{name} (bearer) | resolve + refresh | project from token; ETag: "<sha256(wasm)>"; If-None-Match → 304 (platform/api/main.py:857-893) |
POST /v1/agents (bearer) | auto-register on 404 | first register generates default role-aware policy + signed bundle; re-register never touches policy_yaml (main.py:829, services.py:1290) |
GET /v1/.well-known/keys | trust bootstrap | JWKS TOFU when no key pinned (cloud/client.py:234) |
PUT /v1/projects/{id}/agents/{name} (dashboard) | not called by SDK | save-time recompile + re-sign via build_signed_bundle — the producer of what we pull (services.py:911) |
_agent_read, main.py:577): agent_yaml, policy_yaml,
system_md, bundle_wasm_b64, bundle_manifest (exact signed bytes as
text), bundle_signature_b64.
8. Behavior changes, migration, and known gaps
8.1 Breaking-ish changes (changelog required)
- Adapters: allow-all → real policy. Wrapped OpenAI/Google/pydantic_ai/ LangChain agents go from “everything allowed” to “whatever the platform says”, with deny-by-absence for unlisted tools. Mitigated by auto-register’s generated default policy covering the agent’s actual tool names; surfaced by the §5 uncovered-tools warning.
- Adapters: missing/unreachable platform now raises at wrap/construct
instead of silently running open. Escape hatch:
fallback=. create_agentwithHEXGATE_KEYset + anamenow binds by default (auto mode). Programmatic callers who want a bare graph in a keyed environment passbind_policy=False.
8.2 Non-changes
stream_agent/invoke_agent/hexgate servebehavior is identical.- Loaders’ public signatures are identical.
- Enforcement outcomes (
ALLOW/NEEDS_APPROVAL/DENYrendering per adapter) are identical.
8.3 Known gaps carried forward (explicitly out of scope, tracked)
- Unbounded staleness on refresh failure. No max-staleness TTL; a
platform outage keeps the last verified policy in force indefinitely
(one warning per run). A future
HEXGATE_POLICY_MAX_STALENESSknob can harden revocation scenarios. _WasmPolicyCacheis unwired.WasmPolicy.from_bytes_cached(wasm_engine.py:169) has no production call sites;PolicyBundle.policy()callsfrom_bytesdirectly (bundle.py:275). A200whose wasm bytes are unchanged (e.g. a non-policy field edit) pays a fresh ~50–100 ms wasmtime instantiation. Fix (small, recommended rider):policy()→WasmPolicy.from_bytes_cached(self.wasm_bytes, self.wasm_hash)whenwasm_hashis present.- Non-
BaseToolspecs passenforce_policyunguarded (factory.py:415-426) — pre-existing; unchanged by this spec but worth a warning log in a follow-up.
9. Test plan
9.1 Core (tests/security/test_policy_binding.py)
- precedence: local override beats platform beats fallback; no key + no fallback raises.
- resolve: 200 → verified bundle, source pre-seeded (next fetch sends
If-None-Match); bundle-less payload → pydantic engine;REQUIRE_SIGNATUREblocks the fallback; bad signature raises; 404 + auto-register → registers then binds; 404 without → raises. - refresh: 304 → no swap (object identity), one HTTP call; 200 → swap, old
decisions change on next
decide; fetch exception → warn + keep policy; tampered 200 → warn + keep policy; concurrent refreshes serialize (lock). prefetchedshort-circuits the resolve fetch (call count == 0).
9.2 Factory/loader
create_agent(name=..., bind_policy=True)with mocked client: tools areGuardedTools,_policy_source/_bindingpresent,stream_agentissues the conditional GET.- auto mode: no key → bare agent unchanged,
refresh_policy()no-ops. bind_policy=Truewithoutname→ValueError.load_hexgate_agentregression suite passes unmodified (dedupe is behavior-preserving); single round-trip asserted via mock call count.with_toolsrebuild keeps_bindingreachable.
9.3 Per adapter (×4, same skeleton)
- wrap pulls platform policy: a platform deny rule blocks the tool (assert the framework-appropriate error rendering).
- second run sends
If-None-Match; 304 keeps enforcer.policy identity. - platform policy updated between runs → next run enforces the new policy.
- platform down at refresh → run proceeds on previous policy + warning.
- wrap with unreachable platform raises;
fallback=allows it. - OpenAI-specific: binding cached across
runcalls (resolve once);run_streamedrefreshes before setup; per-call rewrap shares the cached enforcer. - Google-specific: construct-time failure raises; refresh fires per
run/run_asyncwithout rebuilding the ADKRunner. - concurrency smoke: two concurrent runs on one proxy/runner, single fetch.
9.4 End-to-end (against the platform test app)
Extendplatform/api/tests/test_agents.py style: register → wrap (each
adapter) → run (denied tool) → edit policy via PUT → run again →
allowed/denied flips; assert exactly one 200 and N 304s on the agent
endpoint.
10. Implementation phases
Each phase lands green and independently shippable.| Phase | Content | Files |
|---|---|---|
| 1 | PolicyBinding + AutoRegisterSpec + move loader env-override helpers; core tests | hexgate/security/binding.py (new), hexgate/agents/loader.py (re-import), tests/security/test_policy_binding.py |
| 2 | Loader dedupe onto PolicyBinding (with prefetched); HexgateAgent.refresh_policy delegates to binding; back-compat aliases | hexgate/agents/loader.py, hexgate/agents/factory.py |
| 3 | create_agent(bind_policy=...) + auto mode | hexgate/agents/factory.py, tests |
| 4 | LangChain BYO adapter (smallest delta; validates the adapter pattern) | adapters/langchain/wrapper.py, adapters/langchain/agent.py |
| 5 | Google adapter (construct-once shape) | adapters/google/wrapper.py, adapters/google/runner.py |
| 6 | OpenAI adapter (binding cache, run_streamed ordering) | adapters/openai/wrapper.py, adapters/openai/runner.py |
| 7 | pydantic_ai adapter | adapters/pydantic_ai/wrapper.py, adapters/pydantic_ai/agent.py |
| 8 | Rider: wire from_bytes_cached into PolicyBundle.policy(); delete the four build_policy_set placeholders’ remaining references; changelog | hexgate/security/bundle.py, docs |
11. Resolved decisions (rationale recorded)
| Decision | Choice | Why |
|---|---|---|
| Eager vs lazy first fetch | Eager in resolve() | loud failures at construction; refresh’s fail-soft would otherwise leave the first run unguarded or bricked |
| 404 at resolve | auto-register by default at adapter/create_agent surfaces, raise otherwise | platform mints a safe default policy + signed bundle (71586c2); matches hexgate serve UX; idempotent re-registers protect dashboard edits |
| Adapters’ default without platform | raise (no more silent allow-all) | a present HEXGATE_KEY signals governance intent; ungoverned must be explicit (fallback=) |
| Refresh failure | warn + keep previous verified policy | availability over freshness; tamper still cannot install itself |
| Refresh granularity | per run/turn | matches stream_agent today; per-tool-call adds a network call to the hot path for negligible win |
| Sync entry points | direct blocking refresh() | ~ms HTTP on a thread already doing sync I/O; to_thread only for async paths |
| Concurrency | threading.Lock around refresh | proxies are multi-user; collapse duplicate fetches; swap itself already atomic |
| Where the code lives | hexgate/security/binding.py | importable by factory, loader, and all adapters without cycles; security logic stays in security/ |