Engineering · Buyer Team · Infrastructure · May 2026

Running multi-tenant A2A agents on AgentCore Runtime: isolation, routing, and cold-start latency

How Buyer Team enforces hard tenant boundaries, routes across a seven-node negotiation graph driven by eight specialized agents, and keeps container start times from becoming a negotiation bottleneck — with the decisions we actually made in production, and why.

Multi-tenancy is a solved problem for stateless APIs. You add a tenant_id claim to the JWT, prefix every database key, and call it a day. The moment your workload is an LLM agent that reasons, calls tools, persists state, and fans out across other agents, every one of those properties becomes load-bearing — and the usual shortcuts stop working.

Buyer Team is a procurement automation platform running on Amazon Bedrock AgentCore Runtime, with eight specialized A2A agents orchestrated by an AWS Step Functions state machine. Every negotiation — from a $500 office-supply spot bid to a $5M strategic contract — runs the same deterministic DAG, invoking different agents at different nodes, always under tight governance constraints. The system is multi-tenant SaaS from day one.

This post covers the three operational challenges that turned out to be harder than expected: enforcing isolation across an agentic execution boundary, routing requests through a hybrid orchestrator-plus-agent topology, and managing cold-start latency in a container-per-agent deployment. None of them have clean textbook answers, so I'm documenting what we actually decided and why.

Why AgentCore Runtime

Before the three challenges, it's worth saying why we put the agents on AgentCore at all rather than rolling our own on ECS or Fargate. The platform absorbs four things we'd otherwise have to build and operate ourselves — and for a multi-tenant SaaS, each one removes a class of code that's easy to get subtly wrong.

⚙

No capacity to manage

AgentCore provisions a dedicated microVM per session (up to 2 vCPU / 8 GB) and handles horizontal scaling itself — there are no instances to size and no per-instance concurrency to tune (PRD-007 §6.1). A 200-concurrent-bid spot fan-out gets elasticity without an autoscaling group or a warm-pool policy. The dial we govern instead is the account session quota, a far smaller surface than a fleet.

⇄

Outbound auth via Token Vault

AgentCore Identity covers both directions: inbound Cognito JWT validation, and outbound OAuth 2.0 to supplier MCP servers and ERP/P2P systems through the Token Vault (PRD-005 §4; PRD-007 §6.4). Per-tenant supplier credentials live under ${env}/${tenant_id}/plugins/${plugin_id}/, so we never operate a bespoke secrets layer for third-party connector auth — exactly the part that multiplies as tenants onboard.

◧

Isolation enforced in the platform

The Gateway doesn't just host Cedar policies — an interceptor rewrites tenant_id in tool-call arguments from the validated JWT claim, and assumes per-Gateway ABAC roles tagged with tenant_id so DynamoDB leading keys, S3 prefixes, and Token Vault paths are scoped by ${aws:PrincipalTag/tenant_id} (PRD-005 §5.4, §3.1.1). Tenant scoping lives in infrastructure, below the agent, so it survives a buggy or prompt-injected agent.

◎

Observability and evals built in

AgentCore emits platform metrics — invocation duration, tool-call counts, token usage, cold-start duration — with zero code instrumentation (PRD-004 §2.1). And AgentCore Evaluations runs a built-in harness (LLM-as-Judge, Ground Truth, code-based) across 100% of production traces (PRD-004 §4), which is how we catch the silent quality drift an autonomous negotiation agent is otherwise prone to.

There's a fifth capability we deliberately don't use yet: AgentCore Memory supports cross-session persistence that survives microVM termination, but we keep durable negotiation state in DynamoDB for structured queries and audit completeness, reserving Memory for turn-by-turn context within a single invocation (PRD-001 §4.1). It's a lever we've left in reserve, not a benefit we're banking today.

A note on Terraform coverage

An earlier draft of our infrastructure carried AWS CLI provisioners for the AgentCore resources that had no Terraform support — Evaluations, the Cedar Policy engine, and the Dataset/Harness resources. Those gaps are now closed: on Terraform 1.15.5 with the AWS provider 6.47.0, every AgentCore resource we deploy is managed declaratively, and the null_resource + CLI shims (and their drift-detection smoke tests) have been retired. The whole stack is now plan/apply with no out-of-band steps.

The multi-tenancy problem is different for agents

In a conventional microservice, a tenant's data never leaves its own execution path. A request comes in, touches a database, returns a response. The boundary is the HTTP handler. For agentic systems, the boundary gets fuzzier in four ways.

⚠

Shared reasoning context

LLM agents have an in-context window. If tenant A's data bleeds into an agent's prompt — through a poorly scoped tool result, a cached session, or a leaked memory fragment — the model may incorporate it into reasoning for tenant B. No exception is thrown. No log entry is written. The contamination is invisible.

⊙

Tool calls cross system boundaries

Every external action is a tool invocation. Tools carry parameters, and those parameters include entity IDs, supplier names, and budget figures. Without explicit tenant scoping at every tool boundary, an agent can call a tool with tenant A's parameters and receive tenant B's data — not because of a bug in the agent, but because the underlying data layer wasn't scoped.

⇄

Agent-to-agent communication amplifies risk

In a multi-agent topology, one agent's output becomes another agent's input. A cross-tenant data leak at agent 1 propagates to agents 2, 3, and 4 before anyone notices. The blast radius grows proportionally to the depth of the chain.

◎

State persists across turns

Procurement negotiations are long-running: a competitive auction can run up to five business days. The Step Functions execution persists negotiation state to DynamoDB checkpoints after every node. If a checkpoint key is not tenant-scoped, a resume operation after a restart could load another tenant's state. The bug only manifests on restart — exactly when you're least prepared for it.

These aren't hypothetical failure modes. They're structural properties of agentic systems that have to be addressed architecturally, not patched reactively.

Layer one: isolation

Our decision was to enforce tenant isolation at three independent levels (PRD-005 §2 defense-in-depth; §5 tenant isolation). The design principle: any single level can fail and the others still hold. Cross-tenant data leakage is a business-critical failure in procurement — a competitor's bid data visible to the wrong buyer would be catastrophic.

Three independent isolation layers — identity, data, and agent tool access. The design principle is independence: any one can fail and the others still hold.

A note on Layer 1: tenant identity reaches the agent as a normalised top-level tenantId claim, regardless of how the caller authenticated. A Cognito Pre-Token-Generation Lambda promotes it from the custom:tenantId attribute for human and federated principals, and from the authenticated per-tenant App Client for machine-to-machine (client_credentials) tokens (PRD-005 §4.1). Both bindings are values the platform controls — neither is a claim the caller can assert — and they are the only two ways tenantId is ever set.

The key insight in Layer 2 is that DynamoDB's partition key design is the strongest isolation guarantee we have. All 20 domain and platform tables use a tenant_id#<entity>_id partition key (PRD-001 §5.5). Even if application logic fails — a bug in a tool, an injected prompt that exfiltrates data — DynamoDB will not return another tenant's rows if the PK prefix doesn't match. It's a physical constraint, not a software policy.

Layer 3 is where the agent topology adds complexity. Every agent is instantiated through a DynamicAgentFactory (PRD-003 §3.1, PRD-010) that sources its parameters from the {env}-system-config DynamoDB table at construction time. The factory also injects tenant_id into the agent's base context before any tool invocation occurs. This means the agent cannot make a tool call without carrying tenant identity — even if the tool implementation forgets to check.

# Simplified DynamicAgentFactory pattern
# tenant_id is injected at construction — not passed per-call

def build_agent(agent_type: str, tenant_id: str) -> StrandsAgent:
    # Config is read from the {env}-system-config DynamoDB table
    # (model / governance / features groups) at construction time.
    config = system_config.get_model_config(agent_type)

    base_context = {
        "tenant_id": tenant_id,
        "model_id": config["model_id"],
        "evaluation_thresholds": config["thresholds"],
    }

    return StrandsAgent(
        model=config["model_id"],
        temperature=config["temperature"],
        max_tokens=config["max_tokens"],
        context=base_context,           # tenant_id flows through every tool
        tools=get_tools_for(agent_type, tenant_id),  # Cedar-scoped toolset
        hooks=get_hooks_for(agent_type),             # Steering Hooks
    )

Why Cedar and not IAM conditions

Cedar policies (PRD-005 §5.1) are evaluated at the AgentCore Gateway boundary — external to the agent's code. An agent under prompt injection cannot escalate its own permissions because the policy check happens before the tool invocation reaches agent code. IAM conditions apply at the AWS API boundary, which is too late: the agent has already decided what to call. Cedar is the right enforcement point for agentic tool access. (There's a second Cedar plane too — user-to-entity authorization for actions like approving a PO or resolving a stuck negotiation — but that one is evaluated at the Step Functions interrupt-resume boundary by an API Gateway Lambda authorizer, not at the Gateway; PRD-005 §5.5.)

Layer two: routing

Buyer Team's topology is a hybrid: a deterministic Step Functions Orchestrator at the center, with A2A agents invoked at specific nodes (PRD-001 §4.1; PRD-002 §1). This is a deliberate architectural choice — not "orchestrator or agents" but "orchestrator with agents." A pure autonomous multi-agent system is inappropriate for procurement: regulatory auditability needs deterministic workflow traceability, governance has to be enforced structurally rather than via prompt instructions, and failure recovery requires known checkpoints rather than full restarts.

The DAG has seven nodes. Nodes 1 and 3 — Ingest & Validate and Strategy Router — are pure deterministic Python. Node 2 (Kraljic Classify) is an LLM agent on the SimpleLLM tier, but it runs under tight constraints: a fixed prompt, a confidence threshold, and a semantic cache in front of it. Node 4 branches into one of four strategy variants based on the Kraljic quadrant (Non-Critical, Leverage, Bottleneck, Strategic). Nodes 5–7 converge: Evaluate, Approve, Award.

"A deterministic workflow DAG (AWS Step Functions) provides the structural backbone, while LLM-powered agents (Strands A2A, deployed on AgentCore Runtime) supply adaptive intelligence at each decision point within governed guardrails." — PRD-001 §1

The routing challenge in a multi-tenant system is that different tenants have different thresholds, different approval workflows, and different integration skills. The same DAG runs for all tenants — but parameterization is tenant-specific, resolved per tenant from the {env}-system-config table and per-tenant override tables at agent instantiation.

How routing decisions propagate

Node 1

Ingest & Validate

Purchase Requisition arrives. Input is validated, the Approved Supplier Pool is assembled, and per-tenant admission control runs — the admission decision and the DRAFT → ACTIVE status write execute atomically as a single DynamoDB TransactWriteItems. Tenant context is written into the negotiation record in DynamoDB.

✓ pass → validation + admission OK → continue to Node 2

✗ fail → validation fails → TERMINATED / PR CANCELLED; admission denied → HTTP 429 (Retry-After)

Node 2

Classify (Kraljic Agent)

SimpleLLM-powered agent classifies each category by profit_impact × supply_risk against tenant-configured thresholds (default 0.5). Outputs quadrant assignment with confidence score. The semantic cache is checked first — an identical category seen before hits the cache directly, skipping the LLM call.

✓ continue → quadrant assigned (cache hit or LLM)

⚠ low confidence (<0.8) logs a warning but proceeds; a schema-invalid response after retries falls back to rule-based classification (confidence 0.0) — the negotiation is never blocked here

Node 3

Strategy Router

Deterministic branch: exactly one of four Node 4 variants executes per negotiation. No agent involved — pure conditional logic on the quadrant field. The supplier delivery gate runs here, at the end of Node 3, filtering the quadrant-specific candidate pool by address and delivery threshold. A2A calls to Node 4 agents carry the tenant_id and the negotiation state loaded from DynamoDB.

✓ pool non-empty → routes to 4a / 4b / 4c / 4d based on quadrant

✗ delivery gate empties the pool → Negotiation + PR CANCELLED (no_eligible_suppliers)

Node 4a–4d

Strategy execution (A2A agents)

4a: Spot Bidding Agent (SimpleLLM, up to 200 concurrent bids, <24h). 4b: Leverage Auction Agent (DefaultLLM, up to 5 rounds with convergence detection). 4c: Bottleneck Negotiation Agent (DefaultLLM, TCO-aware, supply-risk weighted). 4d: Strategic Partnership Agent (DefaultLLM, relationship and innovation scoring). Each agent runs as its own AgentCore Runtime, with a Cedar-scoped tool set and Steering Hooks.

✓ pass → bids / offers collected → Node 5

✗ fail → deadline passed with zero bids → REQUIRES_ATTENTION (trigger #1)

Nodes 5–7

Evaluate → Approve → Award

Bid Evaluation Agent scores bids (DefaultLLM, multi-constraint). The Approval Gate checks the tenant spend threshold — auto-approves below it, and pauses via a Step Functions Task Token callback above it (human-in-the-loop, 96h hard timeout). The Award & Communications Agent issues POs grouped by supplier_id.

✓ complete → PurchaseOrder issued per awarded supplier

✗ 96h timeout without approval → REQUIRES_ATTENTION (trigger #3, compliance review)

The seven-node DAG. Deterministic Python at Nodes 1 and 3; A2A agents elsewhere. Agents run on AgentCore Runtime and never call each other directly — every hop is mediated through DynamoDB state.

No direct agent-to-agent communication

A design decision that surprised some reviewers: agents in Buyer Team do not call each other directly (PRD-002 §3.4). All inter-agent communication flows through DynamoDB state, dispatched by the Step Functions Orchestrator. The Spot Bidding Agent writes bid results to state; the Bid Evaluation Agent reads them back from state. They never establish a direct A2A channel — in fact each step-invoker Lambda loads the full negotiation state from DynamoDB before it invokes its agent.

This matters for multi-tenancy because it eliminates an entire class of cross-tenant contamination. In a direct A2A topology, if the wrong agent endpoint is called — due to a race condition, a misconfigured registry, or a stale DNS entry — tenant A's context can flow into tenant B's agent. With mediated state, the Step Functions execution is the only entity that dispatches to agents, and every dispatch is scoped to the current negotiation's tenant_id.

The tradeoff is latency: every agent interaction goes through the orchestrator rather than taking a direct path. For procurement negotiations that run over hours or days, this is irrelevant. For a spot bidding fan-out that targets 200 concurrent bids in under 60 seconds, we had to think carefully — and the answer was that the fan-out happens within the Spot Bidding Agent's tool implementation, not across agent boundaries.

Layer three: cold-start latency

AgentCore Runtime runs agents as containers. We deploy one container per agent type (PRD-007 §3.3, §6.1) — eight agent images, each independently versioned and deployable: Kraljic Classifier, the four strategy agents (Spot Bidding, Leverage Auction, Bottleneck Negotiation, Strategic Partnership), Bid Evaluation, and Award & Communications. (A ninth ECR repository holds the Step Functions Lambdas, and a separate skill-runtime image serves each tenant's Integration Skill over MCP on port 8000.) In a multi-tenant SaaS, a burst of concurrent negotiations can trigger cold starts across several agent types at once. That's the worst case.

One thing AgentCore decides for us: scaling. The service provisions a dedicated microVM per session (up to 2 vCPU / 8 GB) and manages horizontal scaling itself — there are no instances to size and no per-instance concurrency to configure (PRD-007 §6.1). Sessions are short-lived and auto-terminate after 15 minutes of inactivity. That removes a whole category of capacity tuning, but it also means our cold-start levers are the ones the platform leaves us: image size, configuration fetch, model warm-up, and session-quota headroom.

Where latency actually comes from

Source	Typical latency	Mitigation	Status
Container image pull (cold start)	8–25s	ARM64 image kept lean and under the 2 GB AgentCore limit; ECR layer caching (scaling itself is fully managed)	implemented
system-config read at agent init	200–800ms	Single-key DynamoDB read in `DynamicAgentFactory` at construction; standard SDK retry; cold-start fallback to conservative safe defaults	implemented
DynamoDB checkpoint load on resume	5–15ms	Consistent reads; single-key lookup on the tenant-prefixed negotiation key	implemented
Bedrock model warm-up (first token)	400ms–2s	Prompt caching on the shared system-prompt prefix; SimpleLLM for low-demand agents	monitored
Cedar policy load	50–120ms	Policy bundle cached in container memory; refreshed on a system-config change alarm or redeploy	implemented
Supplier MCP connection (Node 4)	100ms–1.5s per supplier	Connection pooling; circuit breaker prevents cascade on slow suppliers	v1.1

The config warm-up pattern

The DynamicAgentFactory (PRD-003 §3.1) is the key warm-up control point. Every agent fetches its model ID, temperature, max_tokens, and evaluation thresholds from the {env}-system-config DynamoDB table at construction time — not at invocation time. This means the factory call is where startup latency is paid. By the time the agent receives its first invocation, it's already fully configured. If the table is briefly unreachable at construction (for example during a regional blip), the factory falls back to conservative hardcoded defaults, logs a warning, and retries on the next instantiation rather than blocking.

# Container ENTRYPOINT sequence (simplified)
# Step 1: Read {env}-system-config from DynamoDB before accepting traffic
# Step 2: Load the Cedar policy bundle into memory
# Step 3: Signal healthcheck — only then accept InvokeAgentRuntime calls

async def startup():
    # Single batch read of the model / governance / features config
    # groups from the {env}-system-config DynamoDB table.
    config = await system_config.prefetch_groups(
        ["model", "governance", "features"]
    )

    # Cedar bundle loaded once — refreshed on a system-config change alarm
    await cedar_policy.load_bundle(config["cedar_bundle_s3_uri"])

    healthcheck.ready()   # endpoint starts accepting invocations

Cost note

Because config and the Cedar bundle are read once at container startup, that cost is paid per microVM, at the start of its lifetime — not per request. The session model does the rest: AgentCore spins up and tears down per-session microVMs on demand, so there is no long-lived pool to budget memory for. Agents hold no cross-invocation state beyond the per-invocation idempotency keys they write to DynamoDB (PRD-007 §6.2). What we do watch is session headroom (see below), not instance counts.

Hot-path vs. bursty agents

Not all eight agents sit on the hot path. The orchestrator path and the Kraljic Classification Agent are invoked on every negotiation. The Strategic Partnership Agent (Node 4d) runs only for Kraljic Strategic-quadrant items, which are a small fraction of procurement volume by design. Since AgentCore manages scaling, we don't pin per-agent capacity — we shape latency with the levers the platform gives us (model tier, prompt caching) and make sure the account's session quota has enough headroom for the hot agents to stay warm.

hot path

Keep warm via prompt caching + quota headroom: the orchestrator path, Kraljic Classification Agent, and Bid Evaluation Agent — on the hot path of every negotiation regardless of quadrant.

bursty

High-volume but bursty (managed scaling absorbs it): Spot Bidding Agent (Node 4a) and Leverage Auction Agent (Node 4b) — high request volume in short windows; AgentCore scales sessions to match.

cold

Cold start acceptable: Bottleneck Negotiation Agent (Node 4c) and Strategic Partnership Agent (Node 4d) — low frequency, long-running negotiations where a 15–25s cold start is negligible against days of cycle time.

Session quota is the real capacity dial

The number that actually gates concurrency is the account's active-session quota — 1,000 by default in us-east-1 / us-west-2 (500 elsewhere). At a target of 100 concurrent negotiations, each touching roughly ten sessions across the eight agent Runtimes, the orchestrator Lambda, and the Skill Runtime, the default is consumed at full load with no headroom. We request an increase to ≥1,500 for production and alarm at 80% of ActiveSessionWorkloads in the AWS/BedrockAgentCore namespace (PRD-007 §6.1; PRD-004 §3.2).

Observability across tenants

Isolation and routing decisions only have value if you can observe them. The three-layer observability stack (PRD-004 §2) was designed with per-tenant attribution as a first-class requirement, not an afterthought.

        Layer 1
        Platform Telemetry
        AgentCore Observability → CloudWatch
      

        agent.invocation.duration
        agent.tool.call.count
        bedrock.token.usage
        cold_start_duration
        tenant_id dimension
      

        Layer 2
        Business Domain Metrics
        Step Functions step-invokers + agents → CloudWatch custom namespace
      

        procurement/negotiation.duration
        procurement/savings.rate
        procurement/cost
        procurement/bid.count
        tenant_id · quadrant · agent dimensions
      

        Layer 3
        Distributed Traces
        ADOT → X-Ray → AgentCore Observability
      

        negotiation_id (root span)
        tenant_id (span attribute)
        agent_type · tool_name · model_id
        cross-agent trace propagation
      

Cost attribution per tenant (PRD-004 §6.2; PRD-009) requires token usage tracked across four dimensions: tenant, agent, negotiation, and model. The orchestrator's token cost is prorated across agents by invocation-count ratio — if a negotiation calls three agents and the orchestrator ran six times in support of them, those six invocations are split proportionally. Monthly per-tenant cost reports are generated from the {env}-tenant-cost-attribution table, which CUR-joins the procurement/cost CloudWatch metric (PRD-001 §5.5; PRD-009 REQ-CST013), because we need to attribute cost before the AWS bill arrives.

Failure modes we designed for explicitly

A few failure modes that only appear in multi-tenant agentic systems, and how we handled them.

✗

Tenant A's negotiation resumes with Tenant B's state

The DynamoDB checkpoint key is partitioned by tenant_id#negotiation_id, so a resume can only ever read within the right tenant. On top of that, write idempotency keys are deterministic — hash(negotiation_id, supplier_id, action, round_number) for bids (PRD-001 §5.1.8), and tuple-based dedup keys per node for recovery (PRD-002 §5.2). A resume that tries to write against the wrong state fails DynamoDB's condition check rather than silently proceeding.

⚡

One tenant's burst starves other tenants

Per-tenant fair-sharing is enforced by Buyer Team's own admission control at Node 1 — reserved and max slots per tenant, backed by DynamoDB counters and a reconciler Lambda (PRD-016; PRD-002 §3.1, REQ-G500–G519). A tenant over its max is rejected with HTTP 429. We additionally lean on model tiering to reduce per-negotiation consumption: SimpleLLM agents use 10–15× fewer tokens than DefaultLLM (PRD-003 §2.1), leaving more headroom for concurrent tenants.

↻

A config change breaks one tenant but not others

Threshold resolution is layered: per-tenant DynamoDB overrides → system-config profile (selected by the tenant's config_profile) → system-config defaults (PRD-001 §5.6). Changes take effect at the next agent instantiation. The config table itself is protected by a write-deny IAM resource policy for agent roles (REQ-C305) and a config.agent_config_change alarm (PRD-004 §3.2), so a malicious or accidental change is caught rather than silently propagated.

◉

Supplier MCP server slow for one tenant affects others

Independent circuit breakers per dependency type (PRD-006): supplier MCP servers, Bedrock endpoints per region, and A2A agent endpoints each get their own breaker. A slow supplier for tenant A trips only that supplier's breaker — the supplier is marked UNREACHABLE and the negotiation continues with the rest — without back-pressuring tenant B's suppliers.

What we'd do differently

Three things that, in retrospect, we should have front-loaded.

Tenant context as a first-class SDK primitive. We built tenant injection into the DynamicAgentFactory, but it's still application-level infrastructure that every tool author has to be aware of. A better pattern would be a tenant-aware request context that propagates automatically through each agent invocation — similar to how contextvars propagates Python async context. We already use ContextVars for the memory-degradation signals that flow back through A2A responses (PRD-002 §4.1), so the pattern is proven in our codebase; the gap is making tenant identity ride the same rail. We're tracking whether AgentCore adds this natively.

Load-test across tenants before load-testing per-tenant. Our initial load tests were single-tenant: 200 concurrent bids, one negotiation, one supplier pool. The multi-tenant interference patterns — shared session quota, shared Bedrock endpoint quotas, shared ECR pull bandwidth — only appeared when we ran 10 tenants concurrently. Build the multi-tenant load scenario first.

Cost attribution instrumentation from day one. We added the procurement/cost metric at the orchestrator level in week 3. The token counts from agent invocations in weeks 1–2 were logged but not attributed. For a SaaS product where per-tenant cost visibility is a stated requirement (the Cost-attribution cross-cutting concern, PRD-001 §4.4, and the §8.4 platform capability), this should have been part of the initial agent scaffolding — not a retrofit.

Summary

Multi-tenant A2A agents on AgentCore Runtime require explicit architectural decisions at three levels: isolation (JWT → data key prefix → agent context injection), routing (mediated through the Step Functions Orchestrator via DynamoDB state, with no direct agent-to-agent calls), and cold-start management (model tiering and prompt caching on hot-path agents, system-config pre-read at container startup, and enough session-quota headroom to keep the hot agents warm).

The common thread across all three is that isolation and performance guarantees in agentic systems have to be architectural properties — encoded in key schemas, factory patterns, and deployment topology — not application-level conventions that rely on developers remembering to do the right thing. Agents are autonomous; your isolation mechanisms need to be too.