Engineering · Buyer Team · Architecture Decision ·

Why we chose Bedrock AgentCore — and what we turned down to get there

Buyer Team runs seven specialized procurement-negotiation agents in production. The most consequential infrastructure decision we made wasn't which model to use — it was where the agents run. Here's the runtime decision in full: the four alternatives we evaluated, the tradeoffs we accepted, and the two constraints that ultimately decided it.


Every serious agent system eventually faces the same fork in the road: do you build the runtime, or do you buy it? For most of our twenty-year careers the answer has been "build" — containers on ECS, a queue, an autoscaling group, and you own your destiny. So when we put Buyer Team's agents on Amazon Bedrock AgentCore, it wasn't a default. It was a decision we argued through, and one we could just as easily have made the other way.

Buyer Team is multi-tenant procurement-automation SaaS. The system follows a principle we call "Orchestration Before Intelligence": a deterministic AWS Step Functions DAG provides the structural backbone, and LLM-powered agents — built on the Strands Agents SDK, speaking the A2A protocol — supply adaptive intelligence at each decision point within governed guardrails. Every negotiation, from a $500 office-supply spot bid to a $5M strategic contract, runs the same auditable workflow. The runtime question was: where do those seven agents actually execute?

This post is the decision record. It's deliberately not a marketing piece for AgentCore — it's the honest comparison, including the things we gave up.

What the workload actually demands

Before evaluating any platform, we wrote down the non-negotiable properties of the workload. A runtime choice is only as good as its fit to these, so they're worth stating plainly.

Hard tenant isolation

Cross-tenant leakage is a business-ending failure in procurement — a competitor's bid visible to the wrong buyer is catastrophic, not a P3 bug. Isolation has to be an infrastructure property that survives a buggy or prompt-injected agent, not an application convention.

Spiky, fan-out concurrency

A single competitive auction can fan out to 200 concurrent supplier bids, then go quiet for hours. Auctions run up to five business days; spot bids close in under 24 hours. The runtime must absorb bursts without us hand-tuning a fleet.

Outbound auth to many third parties

Agents call per-tenant supplier MCP servers and ERP/P2P systems over OAuth 2.0. Each tenant onboards its own credentials. A bespoke secrets-and-token layer here is exactly the part that multiplies painfully as tenants grow.

100% governed, 100% observable

Our OKRs require every negotiation to enforce all policy guardrails and to carry a complete OTEL audit trail. Quality drift in an autonomous negotiator is silent by nature, so continuous evaluation across production traffic is a requirement, not a nice-to-have.

Notice what these have in common: each one is a place where a self-built solution is easy to get subtly wrong, and where being wrong is expensive. That framing did most of the work in the decision.

The four alternatives we evaluated

We took four candidate runtimes seriously enough to sketch the architecture for each. Here's where each one landed.

Option 1 — Self-host containers on ECS / Fargate

The familiar path, and the one our instincts reached for first. Package each agent as a container, run it on Fargate, put it behind an autoscaling policy. We know exactly how to operate this.

The problem is everything that isn't the container. To meet the workload demands above, we'd be building — and then operating forever — a per-tenant token vault for outbound OAuth, an ABAC scoping layer so DynamoDB leading keys and S3 prefixes resolve per tenant, a warm-pool policy to fight cold starts on the spiky fan-out, an OTEL instrumentation layer on every agent, and a bespoke evaluation harness to catch quality drift. None of that is novel work. All of it is load-bearing, security-critical, and ours to keep patched. For a small team, that's a second product competing with the actual product.

Option 2 — AWS Lambda per agent

Lambda kills the capacity-management problem outright and bursts beautifully. For a stateless tool call it would be ideal. But our agents aren't a single short function — an agent reasons across multiple model round-trips, holds turn-by-turn context, and can run well past Lambda's ceilings on a complex evaluation or a multi-round bid. We'd spend the savings fighting the execution-time and statefulness constraints, and we'd still be back to building isolation, outbound auth, and evals ourselves. Lambda solved the one problem we were least worried about and none of the ones we were.

Option 3 — Raw Bedrock + a bespoke orchestrator

Call the Bedrock model APIs directly and write our own agent loop and routing. Maximum control, minimum lock-in. This is genuinely viable, and for a single-tenant internal tool it might have been the right call. But it concentrates all the undifferentiated heavy lifting — session management, identity, memory, observability, evals — onto us, while giving back only flexibility we didn't have a concrete need for. We'd be hand-rolling the exact platform AWS was about to sell us, minus the integration testing.

Option 4 — A third-party agent framework as the runtime

Frameworks like LangGraph and others offer orchestration and agent abstractions out of the box. We use a framework for the agents themselves — Strands, for the A2A protocol and tool model. But adopting one as the production runtime and control plane is a different commitment: it pulls a fast-moving dependency into the most security-sensitive layer of a multi-tenant system, and it still doesn't natively give us AWS-native tenant-scoped IAM, a managed token vault, or evaluation over 100% of traces. We were happy to take a framework for the SDK ergonomics; we weren't willing to make one the foundation.

The scorecard

Scored against the four workload demands, the options separate cleanly. "Build" in the table below means we'd own that capability as bespoke, security-critical code we operate indefinitely.

Option Isolation Burst concurrency Outbound auth Observability + evals Verdict
ECS / Fargate Build (ABAC) Warm-pool tuning Build vault Build harness 4× build
Lambda per agent Build (ABAC) Native Build vault Build harness statefulness wall
Raw Bedrock + custom Build Build Build vault Build harness all heavy lifting
3rd-party framework Build Framework-dependent Partial Partial dependency in trust layer
Bedrock AgentCore Platform (Gateway ABAC) microVM per session Token Vault Native + Evaluations chosen

The pattern is the whole argument. Every alternative turns at least three of our four hard requirements into bespoke infrastructure. AgentCore turns them into managed platform behaviour — a dedicated microVM per session with no fleet to size, inbound JWT validation and outbound OAuth through a managed Token Vault, tenant scoping enforced below the agent via Gateway ABAC, and platform-emitted metrics plus a built-in evaluation harness running across production traces. We weren't buying convenience; we were buying our way out of operating four security-critical subsystems we'd otherwise have to build.

The tradeoffs we accepted

Choosing a managed runtime is not free, and pretending otherwise would make this a worse decision record. Three costs we took on with eyes open.

Vendor coupling

AgentCore is AWS-specific. Our isolation model, identity flow, and observability now assume its primitives. We mitigated this by keeping all durable negotiation state in DynamoDB and the workflow in Step Functions — both portable — so the agents are the replaceable layer, not the system of record. But the coupling is real and we named it.

Platform quotas as the new dial

We don't tune instances anymore — we govern the account session quota instead. That's a smaller surface, but it's a hard ceiling we don't control, which is why region choice (below) became load-bearing. We added our own per-tenant admission control on top so one tenant's burst can't consume the shared quota.

Maturity and coverage gaps

AgentCore is new. An earlier cut of our infrastructure carried AWS CLI shims for resources without Terraform support — Evaluations, the Cedar policy engine, the Dataset/Harness resources. We accepted out-of-band provisioning temporarily, tracked it as debt, and have since closed it: the whole stack is now declarative on Terraform 1.15.5 with AWS provider 6.49.0, no CLI shims.

The two constraints that actually decided it

Once AgentCore was the front-runner, two regional realities turned an abstract preference into a concrete deployment decision — and they're the kind of detail that doesn't show up until you commit.

Constraint 1 — Evaluations isn't everywhere

AgentCore Evaluations is GA in only 9 of 16 regions. Since 100% evaluation coverage is a stated requirement, production simply can't deploy to a region without it. That alone eliminated several region options before any latency or data-residency conversation started.

Constraint 2 — Session quota varies by region

us-east-1 and us-west-2 carry a session quota of 1,000 concurrent sessions versus 500 elsewhere. Because the session quota is the dial we govern instead of a fleet, that 2× headroom directly determines how many tenants can run concurrent auctions before admission control starts issuing 429s. We deploy to us-east-1 / us-west-2 for exactly this reason.

These two constraints are why "we chose AgentCore" is really "we chose AgentCore in a specific region." The platform decision and the region decision turned out to be the same decision.


Summary

We chose Bedrock AgentCore not because building was impossible, but because every build alternative converted three or four security-critical requirements — tenant isolation, burst concurrency, outbound auth, and evaluation coverage — into bespoke infrastructure we'd operate forever, with no corresponding product advantage. AgentCore absorbs those as platform behaviour, and we keep our durable state and workflow in portable services (DynamoDB, Step Functions) so the agents remain the replaceable layer.

The tradeoffs are real: vendor coupling, platform quotas we don't control, and a maturity curve we had to ride through with temporary CLI shims. We took them deliberately. And the decision only became concrete once two regional constraints — Evaluations availability and session-quota size — collapsed the platform choice and the region choice into one. That's the honest shape of the call: not "AgentCore is best," but "for a small team building governed, multi-tenant negotiation agents, the build alternatives cost more than they returned."

Gustavo Azevedo — AI Solutions Architect, Buyer Team · June 2026

Based on PRD-001 v1.0.36, PRD-004, PRD-005, PRD-007, PRD-009, PRD-016.