#03 · 3.a · Data Platform Cost Comparison · v2.0

Five platforms.
One workload.
A <10% spread.

When you put GCP, Azure Fabric, AWS, Databricks, and Snowflake on the same enterprise workload — 5,000 ETL jobs a day, 3.5 petabytes of data, 10 TB ingested daily — annual costs land between $3.28M and $3.7M. The dollar gap is real, but it's narrower than the strategy gap. Here's what's actually inside the bill.

5 platforms compared $3.28M low end $3.70M high end <10% delta

Curator Teresa Tung · Lead — Center for Advanced Data

01

The story begins with a misconception.

Every data platform RFP we've seen starts the same way: leadership wants to know which platform is cheapest. Procurement builds a pricing matrix. Engineering picks the architecture. The board signs off on a number.

And then, twelve months in, the bill arrives — and almost nobody is over budget by more than a rounding error.

That's not an accident. It's the math. Modern cloud-native and independent data platforms — at enterprise scale — converge on cost. The interesting question isn't "which is cheapest." It's which one matches how your business actually works.

02

First, a fair fight.

To compare platforms honestly, you need an identical workload running on each. We picked one that looks like a real Fortune-500 data estate, not a partner benchmark.

The Sample Medium-Sized Enterprise Workload

5K

ETL jobs / day

20 min

Avg. execution time / job

3.5 PB

Data volume in platform

10 TB

Ingested daily

A second profile — an MVP/pilot at 40 jobs/day, 15 min/job, 100 TB in platform, 25 GB/day ingested — runs alongside as a sanity check.

03

Two ways to buy the same outcome.

Every modern data platform falls into one of two archetypes. Understanding the split is the prerequisite to understanding the bill.

Archetype A

Cloud Native Services

From AWS, GCP, and Azure — cloud-managed offerings where consumption drives the cost.

One bill, one partner. Single-cloud procurement and support contract.
Linear pricing model. Storage + per-query compute, easier to forecast.
Lower baseline for steady-state BI and analytics workloads.

Archetype B

Independent Data Platforms

Databricks and Snowflake — software deployed on top of cloud infrastructure. Consumption drives both software and infrastructure costs.

Dual billing. Platform service units (e.g., DBUs) plus cloud instances + storage + networking.
Pre-integrated workflow. Lakehouse-centric, ML-ready, multi-cloud portable.
Higher complexity to forecast — but more predictable for Spark/ETL-heavy workloads with reserved resources.

04

Now the receipts.

Same workload. Same enterprise scale. Five different ways to deliver it. Here's what each actually costs, broken into the three layers that drive the bill: ETL pipeline compute, warehouse analytics compute, and storage.

Cloud Native Data Platform Services

Layer	GCP Native	Azure Native (Microsoft Fabric)	AWS Native
Pipeline compute (ETL)	Dataflow + BQ Spark + Composer 200 workers n2-std-4 + 3,200 BQ Slots Dataflow 15 hrs / BQ Slots 24 hrs $167K / mo	Data Factory + Spark F2048 (2,048 CUs) 15 hrs / day $165.9K / mo	AWS Glue + EMR 100 DPUs (G.2X) 15 hrs / day $198.9K / mo
Warehouse compute (Analytics)	BigQuery Enterprise Slots 2,000 Slots 15 hrs / day $54K / mo	Synapse DW + Power BI F1024 (1,024 CUs) 15 hrs / day $82.9K / mo	Redshift Serverless 384 RPUs 15 hrs / day $61.5K / mo
Storage	BigQuery Active + Long-term 1,750 TB active + 1,750 TB long-term 50% active / 50% long-term $52.5K / mo	OneLake (ADLS) 1,750 TB hot + 1,750 TB archive 50% hot / 50% archive $43.75K / mo	AWS S3 Tiered 3,500 TB 88% Glacier $19K / mo
Monthly total	$273.5K	$292.5K	$279K
Annual total	$3.28M	$3.51M	$3.30M

Independent Data Platform — deployed on AWS

Layer	Databricks on AWS	Snowflake on AWS
Pipeline compute (ETL)	Databricks Jobs + Spark 25 nodes r5n.4xlarge 48 hrs / node / day $130K / mo	Snowpipe + Snowpark 5× XL Warehouses 15 hrs / day $144K / mo
Warehouse compute (Analytics)	All-Purpose Clusters 75 nodes r5n.4xlarge 33 hrs / day (warm) $157.5K / mo	Virtual Warehouses 4× XL Warehouses 15 hrs / day $115.2K / mo
Storage	AWS S3 Tiered 3,500 TB 88% Glacier $19K / mo	Snowflake Native + AWS S3 700 TB internal + 1,750 TB S3 50% compressed / 50% cold $24K / mo
Monthly total	$307K	$283K
Annual total	$3.70M	$3.40M

05

The plot twist.

Look across both tables. Annual spend ranges from $3.28M (GCP Native) to $3.70M (Databricks on AWS). That's a delta factor of less than 10% on a multi-million-dollar enterprise commitment.

GCP Native

$3.28M

AWS Native

$3.30M

Snowflake on AWS

$3.40M

Azure Fabric

$3.51M

Databricks on AWS

$3.70M

$0 annual cost — same workload, five platforms $3.70M

The MVP / pilot profile tells the same story even tighter: a delta factor of less than 5%.

If the cost spread is <10%, cost is not the deciding factor. Strategy is. Operating model is. Where you want to be in three years is.

06

So how do you actually decide?

A three-step executive decision hierarchy. Use TCO to confirm the choice, not to make it.

1

Strategic Positioning

Anchor the decision in the target operating model, governance posture, and innovation ambition. What kind of data company do we want to be?
2

Platform Archetype

Select the platform best aligned to the workload profile and enterprise consumption model. Cloud Native, Databricks, or Snowflake?
3

Validate Commercials

Use TCO to confirm the choice — not to replace the strategic decision with a narrow price comparison.

The three archetypes, at a glance

Cloud Native

Modularity + Engineering Control

Composable services aligned to existing cloud strategy
Strong fit for engineering-led operating models
More flexibility in architecture design and optimization

Databricks

Advanced Analytics + AI/ML

Lakehouse-centric platform for Data Engineering and Machine Learning
Strong support for streaming and notebook-heavy workflows
Well-suited for innovation-led data product teams

Snowflake

Governed Consumption + BI Scale

Enterprise-friendly model for governed analytics consumption
Strong data sharing and standardized business access
Well-suited for governed BI and EDW modernization

07

The bottom line.

At enterprise scale, cost differences across viable platform options are often narrower than expected. The more durable differentiators are governance model, engineering flexibility, business consumption patterns, and long-term innovation needs.

Platform selection should be driven first by strategic fit and operating model — with commercials used to validate the choice.

Ready to map this against your estate?

This breakdown reflects a Sample Medium-Sized Enterprise workload. Cost comparisons are illustrative for the defined workload profile and may vary based on architecture design, optimization practices, and enterprise commitments. The next step is overlaying your actual data volumes, job profiles, and existing cloud commitments against this framework to identify your archetype.

Download the source deck ↓ Talk to Teresa →

#03 · 3.b · Frontier IQ · Live Dashboard

656 models.
One frontier.
One bill.

Frontier IQ is the real-time intelligence dashboard our practice uses to track generative and agentic AI models — not just the strongest, but the fastest, cheapest, and most practical options for the workload in front of you. Today it tracks 656 models, more than 100 providers, and the GPU SKUs across every major cloud. It's how we sit down with client executives and build agentic platforms that are rigorous and economically defensible.

656 models tracked 100+ providers 4 benchmark categories 1 cost lens

Curator Eugene Siow · Source: Frontier IQ — Live Dashboard

Watch first — 10 minutes · narrated walk-through · Eugene Siow

01

The leaderboard tells you only half the story.

Every week a new model lands and a new headline declares it "the best." Procurement bookmarks the link. Engineering kicks off a benchmark. Someone, somewhere, signs off on a model choice based on a single score on a single chart.

And then the production bill arrives.

Benchmarks tell you what a model can do. They don't tell you what it costs to run. A frontier score on reasoning is a starting line, not a finish line. The interesting question — the one that actually decides whether your agent ships — is which model gives you the right capability at the right unit economics for the way your workload actually runs.

[Image Suggestion: A split-screen visual — left side a polished AI leaderboard with confetti and a "WINNER" badge over a single benchmark score; right side the same model rendered as a real production bill with line items, GPU hours, and a highlighted total. Caption beneath: "Same model. Two very different stories."]

02

First, a fair fight — at scale.

To compare models honestly, you need a single source of truth that updates as the frontier moves. Frontier IQ pulls from public sources, normalizes everything into one schema, and refreshes automatically.

What's inside the dashboard, today

656

Generative & agentic models

100+

Inference & API providers

All

Major-cloud GPU SKUs

4

Use-case benchmark families

Benchmarks are organized by what the model is actually being asked to do: general intelligence, software engineering, agentic workflows, and multimodal workflows. For each, the dashboard surfaces the strongest, the cheapest, and the fastest — so the right answer depends on the question, not the headline.

03

The Frontier Curve.

A model isn't a point. It's a moving line. The Frontier Curve plots benchmark score on the y-axis against time on the x-axis, tracking the progress of both open-weight and closed-weight models as the field evolves.

It's how you tell the difference between a one-off spike and a real shift in the state of the art — and it's how you spot when an open-weight model is closing the gap on a closed one fast enough that procurement strategy needs to change.

[Image Suggestion: A clean, dark-mode line chart with two distinct curves — one in purple for closed-weight models, one in light cyan for open-weight — both rising over a 24-month x-axis with labeled inflection points (model release dates). Show the open-weight curve closing the gap at the right edge.]

04

Today's leaderboard, by capability.

A snapshot of where the frontier sits right now. The headline: there's no single "best model." There are best models for things.

Reasoning

A crowded summit.

The strongest model on the reasoning benchmark today is GPT-5.4 Pro (extra-high reasoning). The strongest open-weight model is GLM-5 by Zhipu AI.

Anthropic's Claude Opus 4.7 sits in the top tier.
Meta's Muse Spark sits in the top tier.
Most frontier-lab models perform within 90% of the leader — the leaderboard is full, not empty.

Software Engineering

A truly jagged frontier.

No single lab leads everywhere. The right answer depends on which slice of "software engineering" you mean.

Bug-fixing benchmarks: Claude models dominate.
General programming: OpenAI's GPT models dominate.
Terminal use: a mixture of Gemini and OpenAI on the frontier.

05

Now connect the score to the receipt.

Performance alone isn't enough. Frontier IQ pairs every benchmark with the cost economics behind it — list price per token, throughput per dollar, and the cheapest credible option in each performance band.

When the cheapest option is also a serious option

The dashboard differentiates closed and open-weight models when filtering for cost. Two examples worth flagging:

Closed-weight, low-cost

Gemini 3 Flash

Delivers a blend of strong performance with low cost — making it a credible default for high-volume agentic workloads where cost is a hard constraint.

Open-weight, low-cost

Kimi K2.5

Can be served very cheaply with good performance — a strong option when self-hosting is on the table or when the workload demands open-weight portability.

[Image Suggestion: A scatter plot with benchmark score on the y-axis and dollars-per-million-tokens on the x-axis. Each model is a dot, color-coded purple for closed-weight and cyan for open-weight. Highlight Gemini 3 Flash and Kimi K2.5 sitting in the desirable upper-left quadrant ("high score, low cost") with a labeled callout for each.]

06

Managed API or self-host? The math has an answer.

Benchmarks tell engineering what a model can do. For FinOps, the next question is harder: at what point does it become cheaper to run this model on our own GPUs than to pay per token? The Frontier IQ cost analysis tool plots exactly that.

You select a model. It charts the economics of a managed API against self-hosting on cloud GPUs and surfaces the break-even point — the monthly token volume at which self-hosting starts saving money. Two illustrative cases:

Case A — Phi-4 (small model, by Microsoft)

Dimension	Managed API	Self-hosted on cloud GPU
Setup	Pay-per-token No capacity planning Pricing scales with usage	Single GPU instance Self-managed serving stack Fixed monthly cost
Verdict	Self-hosting wins at scale. A single GPU delivers enough monthly token capacity that, past the break-even point, Phi-4 is materially cheaper to host than to call. For small models with steady-state production volume, owning the GPU is the right answer.

Case B — DeepSeek v3.2 (large model)

Dimension	Managed API	Self-hosted on cloud GPU
Setup	Pay-per-token No capacity planning Pricing scales with usage	Large AWS instance 8 × H200 GPUs ~$45,000 / month
Verdict	Managed API wins. The break-even point is far higher than the monthly token capacity that a single 8×H200 instance can deliver. For large models like DeepSeek v3.2, self-hosting doesn't make economic sense at typical enterprise volume — you pay for unused capacity.

The size of the model dictates the deployment strategy. Small models reward ownership; large models reward elasticity. Frontier IQ shows the crossover point in dollars, not in vibes.

07

From dashboard to deployed agent.

Frontier IQ isn't only a dashboard. All of its curated intelligence is exposed via API — which means agents themselves can consume it. The dashboard becomes a tool, not a destination.

1

Connect the agent

Give Claude (or any capable agent) the Frontier IQ skill and an API key. The agent now has live access to model benchmarks, provider pricing, and GPU SKU economics.
2

Brief it like an analyst

"Build a budget and project-cost estimate comparing open-weight and closed-weight models for a KYC / Anti-Money-Laundering agent." The agent runs for about two minutes.
3

Get a defensible cost model

What comes back: a model comparison across open and closed-weight options, API cost projections, self-hosted GPU projections, and a budget summary for pilot, growth/scaling, and full production deployment in the enterprise.

[Image Suggestion: A three-frame storyboard. Frame 1: an analyst hands a single-line brief to an agent icon. Frame 2: the agent silhouette spins through dashboard panels (benchmarks, pricing, GPU SKUs) with a small "~2 min" timer. Frame 3: a clean output document titled "KYC/AML Agent — Budget & Cost Model" with three labeled tiers (Pilot / Growth / Production) and crisp dollar figures.]

What makes this work

Curated data

Models · APIs · Infrastructure

Frequently updated public data on every tracked model
API cost per provider, normalized for comparison
Infrastructure cost across public-cloud GPU SKUs

Tokenomics tools

Context-window economics

MCP server tooling: see how each server consumes context
Model agentic workflows with progressive disclosure instead of full disclosure
Significant savings on context-engineering and per-call cost

API-first

Built for agents, not just humans

Every dashboard view is also a tool an agent can call
Securely-keyed access for enterprise integrations
Continuously upgraded as the frontier moves

08

The bottom line.

The goal of Frontier IQ is simple: help us and our clients understand the frontier of AI capability — and the economics and costs behind it.

Capability without cost is a press release. Cost without capability is a procurement spreadsheet. Frontier IQ is the place we put the two together — so the model strategy you walk into the boardroom with survives contact with the bill.

Ready to map the frontier against your workload?

Frontier IQ figures are illustrative of current public benchmark and pricing data; actual model selection and deployment economics will depend on workload profile, traffic patterns, region, and enterprise commitments. The next step is overlaying your specific use case — KYC, claims, code, customer service, anything — against the live frontier and the live cost curves.

Request a walk-through → Talk to Eugene →

#03 · 3.c · Claude Deployment Channels · April 2026

One model.
Five front doors.
Wildly different rooms.

You think you're picking Claude. You're actually picking five products. Same Sonnet. Same Opus. Same token economics — to a rounding error. Everything else — the feature surface, the residency story, the IAM model, the day-one velocity — splits five ways the moment you choose a door. This is the architect's guide to the door.

5 deployment channels 2 operating archetypes 1 identical model 3-step decision

Authors Atish Ray · Lan Guan · Atish Ray (Chief AI Architect) · Lan Guan (Chief AI & Data Officer)

01

The story begins with a misconception.

Every enterprise Claude conversation we've sat in starts the same way. Leadership picks the model. Procurement picks the cloud. Engineering picks the SDK. Hands are shaken. Decks are filed. The deal closes.

Three months later, a developer files a ticket. Why doesn't Fast Mode work? Why is the Skills Marketplace empty? Where did Computer Use go? Why does our Foundry deployment ship our data to the United States?

That's not a tooling gap. That's the channel.

Claude isn't one product. It's the same model surfaced through five different procurement, governance, and feature shells — and the shell is what your CIO, your CTO, your enterprise architect, and your AI platform lead are actually buying.

The interesting question isn't "do we use Claude." It's "which front door makes the rest of our stack feel like one stack — and which features can we live without on day one?"

02

First, name the doors.

Anthropic ships five enterprise channels for building agents with Claude — and three more knowledge-worker surfaces sitting alongside. You can't choose what you can't name.

The five enterprise channels

1A

Claude in AWS Bedrock

1B

Claude Platform on AWS

2

Claude in GCP Vertex

3A

Claude in Azure Foundry

4A

Anthropic Managed Platform

Three knowledge-worker surfaces ride alongside: Claude in Microsoft 365 (3B) as the agent inside Copilot, claude.ai (4B) for the web and mobile chat experience, and Claude Desktop (4C) for power users. Same model. Different consumption shells. Different price tags.

03

Two ways to buy the same model.

Pull the logos off and the five channels collapse into two archetypes. The split is the whole story. Everything downstream — features, governance, residency, billing — falls out of which side you're on.

Archetype A

Hyperscaler-Operated

Bedrock (1A), Vertex (2), Foundry (3A) — Claude served from inside the cloud catalog you already buy from. Your IAM. Your audit trail. Your commit.

Cloud-native everything. Identity, networking (PrivateLink / VNet), observability, FinOps, cost attribution — all native to the hyperscaler you already operate.
Existing commit applies. Burns AWS EDP, MACC, or GCP commit. No new partner to onboard, no new procurement motion.
Feature surface is narrower. Messages API plus the cloud's own agent stack. The native server-side tools, beta features, and Skills Marketplace live on the other archetype.

Archetype B

Anthropic-Operated

Claude Platform on AWS (1B), Anthropic Managed Platform (4A) — Anthropic's native infrastructure, with optional cloud billing as a procurement convenience layer.

The full Anthropic feature set. Messages, Batches, Files, Models, Skills, Agents, Sessions APIs. Server-side tools, MCP connectors, Fast Mode, Skills Marketplace, Computer Use, beta access.
Earliest features, fastest cadence. Whatever Anthropic ships next, ships here first.
Data leaves your cloud boundary. Processed by Anthropic; non-US data routes to US.

04

Now the receipts.

Same Sonnet. Same Opus. Five very different ways to deliver them. Below: the row-by-row breakdown across the dimensions that actually drive the architecture decision.

Archetype A · Hyperscaler-Operated

Dimension	Bedrock (1A)	Vertex (2)	Foundry (3A)
Infrastructure	AWS-managed	Google-managed	Anthropic-managed (3P)
Availability	Native catalog (Bedrock)	Native catalog (Vertex / Gemini Enterprise model garden)	Azure Marketplace subscription, Foundry model catalog as 3P
Data residency	Fully within AWS — global & multiple regions (US, EU, APAC)	Fully within GCP — global & multiple regions (US, EU, APAC)	US only. Processed by Anthropic; data from non-US comes to US
Available features	Messages API only — comparable features delivered via AWS APIs	Messages API only — comparable features delivered via Gemini Enterprise APIs	Messages, Skills, Files, Token-count APIs. Foundry does not provide built-in content filtering for Claude at deployment time
Available models	Claude and other models on Bedrock	Claude and other models on Gemini Enterprise	Claude through marketplace. Not all Foundry regions support Claude for Claude Code deployments
SDK support	Python, TS, Java, C#, Go, Ruby (all Anthropic SDKs); boto3	Python, TS (Anthropic SDKs); gcloud SDK	Python, TS, C#, Java, PHP — Go & Ruby not yet supported
IAM	AWS IAM and Bedrock keys	GCP IAM and keys	Azure Entra IAM and keys
Guardrails	Native	Native	Manual — content safety not auto-applied
Commercials	Integrated token-based pricing, AWS consumption commitment, Provisioned throughput	Integrated token-based pricing, GCP consumption commitment	Azure Marketplace billing, Microsoft Azure Consumption Commitment (MACC) eligible — no Azure credits
Pre-integrated apps	—	—	M365 (Copilot, Copilot Studio, Excel)
Claude Code	Seamless integration with Bedrock	Routed through Vertex AI; no Anthropic account or API key needed	Fully supported — only 2 regions for Claude Code
Claude Cowork	Claude Desktop app (macOS / Windows) running in 3P mode; routes inference to Bedrock with integrated IAM	Not available yet	Not available yet
Where it shines	Longest-running agents · deepest GovCloud / compliance posture · Intelligent Prompt Routing between Claude tiers automatically · most GA features · most enterprise deployments	Google Search grounding built-in · A2A GA (Google is a co-creator) · deepest data warehouse integration · strong on developer features	1,400+ Logic App connectors · M365 / SharePoint / Fabric grounding · GPT and Claude on one platform · partial Claude Platform integration through Marketplace
Where it doesn't	No native web search for Claude (must wire third-party). A2A still beta. No Vertex-style built-in data warehousing.	No long-running agent duration guarantee. MCP tool search disabled by default. Cowork 3P mode not yet available — only AWS has it.	Data doesn't stay in the Azure boundary — biggest architectural limitation. Newest partnership (Feb 2026); most features still beta/preview. No batch API. Two regions only for Claude Code.

Archetype B · Anthropic-Operated

Dimension	Claude Platform on AWS (1B)	Anthropic Managed Platform (4A)
Infrastructure	Anthropic-operated	Anthropic-operated
Front door	AWS account, AWS billing, AWS IAM — no separate Anthropic account	Anthropic accounts + API keys; SSO for Enterprise
Data residency	Anthropic infrastructure outside AWS — global and US regions	Processed by Anthropic; data from non-US comes to US
Available APIs	Full set — Messages, Batches, Files, Models, Skills, Agents, Sessions	Messages, Batches, Files, Skills, Models APIs; MCP connectors; pre-built agent containers
Native features	Server-side tools · Files API · MCP connector · Fast Mode · Skills Marketplace · Computer Use · beta access	Full + beta — pre-built, configurable agent harness running on managed infrastructure
Available models	Claude only	Full Claude lineup including beta
SDK support	Anthropic SDKs	Python, TS, Java, C#, Go, Ruby, PHP
Commercials	Consolidated billing + AWS consumption commitment	Token-based pricing + batch pricing + prompt cache options
Claude Code	Integrated with Claude Platform and claude.ai web — session memory, auto-compaction, Fast Mode, web tools, MCP connectors	Native — Claude Code can integrate
Claude Cowork	Full features — chat, Skills Marketplace, Computer Use	Native — Claude Cowork can integrate

05

The plot twist.

The model's the same. The token price lands inside a rounding error. The feature surface does not. This is where channels actually compete — and where most "Claude vs Claude" conversations should start.

Bedrock (1A) Hyperscaler-operated

Most GA · deepest GovCloud · Intelligent Prompt Routing

Vertex (2) Hyperscaler-operated

Search grounding · A2A GA · BigQuery integration

Foundry (3A) Hyperscaler-operated

1,400+ Logic App connectors · M365 grounding · most features still preview/beta

Claude on AWS (1B) Anthropic-operated

Full feature set · AWS billing & IAM · Cowork 3P mode

Anthropic (4A) Anthropic-operated

Earliest features · Fast Mode · full Skills Marketplace

Messages API only feature surface — same model, five channels Full Anthropic native

The asymmetries that matter aren't on the price page. They're on the spec sheet:

Bedrock · 1A

The compliance king with a search problem.

Most GA features. Deepest GovCloud and IL4–IL5 posture. Intelligent Prompt Routing across Claude tiers — automatic.

No native web search for Claude — wire a third-party.
A2A still in beta.
No Vertex-style built-in data warehousing.

Vertex · 2

Built for builders, missing the long run.

Google Search grounding native. A2A is GA — Google co-created the spec. Deepest data warehouse integration in the field.

No long-running agent duration guarantee.
MCP tool search disabled by default.
Cowork 3P mode not yet available — only AWS has it.

Foundry · 3A

The newest partnership — and the boundary problem.

1,400+ Logic App connectors. M365, SharePoint, and Fabric grounding. GPT and Claude on one platform.

Data doesn't stay in the Azure boundary — the biggest architectural limitation.
Newest partnership (Feb 2026); most features still beta/preview.
No batch API. Two regions only for Claude Code. Content safety not auto-applied.

1B + 4A

Where the native feature set actually lives.

Anthropic's full surface. Whatever ships next, ships here first.

Fast Mode — 6× speed on Opus 4.6.
Full Skills Marketplace, Computer Use, full Cowork.
Cost optimization: Batch −50% + cache reads −90%.
Claude Code with session memory and auto-compaction.

Token price is not the deciding factor. Feature velocity is. Residency is. Governance is. Existing cloud commit is. Strategy is. Operating model is. Where you want to be in three quarters is.

06

So how do you actually decide?

The deck offers a clean three-step hierarchy. Use it in this order. Skip a step, and you're optimizing the wrong axis.

1

Lead with governance posture.

Strict geographic data residency (EU, APAC)? Regulated industries needing cloud-boundary processing? Cloud-native IAM, VNet/PrivateLink, centralized audit? Cloud-native observability, cost attribution, FinOps? Existing cloud commitments (EDP / MACC / GCP commit)? FedRAMP High / DoD IL4–IL5 (Bedrock GovCloud only)? Need uncapped IP indemnification (AWS, GCP)? Yes to any — start hyperscaler-operated (1A, 2, 3A).
2

Then layer in feature ambition.

Need access to new models and the latest features? Multi-cloud flexibility, integrating Claude from a private cloud? Low-to-medium-complexity agentic apps on managed infrastructure? Dedicated engineering support and custom contracts? Skills Marketplace, Computer Use, full Cowork? Low latency: Fast Mode (6× speed on Opus 4.6)? Specialized advisor tooling (mid-generation pairing)? Claude Code session memory and auto-compaction? Cost optimization: Batch −50% + cache reads −90%? Yes to any — pair with Anthropic-operated (1B or 4A) for those workloads.
3

Build hybrid by design — not by accident.

Production workloads run on the hyperscaler path: Bedrock / Vertex / Foundry for Claude API, agent orchestration, and 3P MCP / Skills / Tools / Data — under AWS, GCP, or Azure administration, IAM, and operations. Exploration, specialized engineering, and rapid prototyping run on the Anthropic-hosted surface: full feature set, agent harness, Skills, MCP servers, connectors — under Anthropic admin, SSO, and IAM. Production where governance matters. Rapid prototyping where features matter.

The three patterns, at a glance

Pattern A · AWS-Anchored

Governance + Cloud-Native Estate

Bedrock for regulated workloads, GovCloud, FedRAMP High, IL4–IL5
Cloud-native IAM, PrivateLink, Guardrails, FinOps, observability
Knowledge worker rollout at scale — consumption billing, no per-seat

Pattern B · AWS + Claude-on-AWS

AWS Commit Meets Full Features

Bedrock where governance demands cloud-boundary processing
Claude-on-AWS (1B) where teams need the full Anthropic feature set, with AWS billing/IAM and Cowork 3P mode
One AWS commit covers both — no second procurement motion

Pattern D · Anthropic-Direct

Beta & Specialized Engineering

Earliest models, Fast Mode, full Skills Marketplace, Computer Use
Multi-cloud flexibility · managed agent harness · custom contracts
Sits alongside Pattern A or B — not instead of

07

A footnote on M365 — because someone on the call will ask.

Two products will collide in your M365 conversation, and they share a name. Claude-enabled Microsoft 365 Copilot (with Cowork inside Microsoft) and Anthropic Claude Cowork (the desktop app). Same word. Different products. Different bills.

Dimension	Claude in M365 Copilot (incl. Cowork)	Anthropic Claude Cowork
Where it runs	Cloud — inside Microsoft 365 (subprocessor)	Desktop app (macOS / Windows) on Anthropic infrastructure
Data access	Full M365 graph: Outlook, Teams, SharePoint, Excel via Work IQ	Local files · browser · MCP connectors (Drive, Slack, Salesforce)
Governance	Microsoft DLP, Conditional Access, Purview audit — runs within Microsoft's security, identity, and governance framework	Folder-level sandboxing — less centrally governed
Best for	M365-standardized enterprises with compliance boundaries	Power users, cross-tool flows, non-M365 estate
Price	$30/user/mo M365 Copilot license — Anthropic INCLUDED, not separate	$20/mo Pro · $100–$200/mo Max · $25–$125/seat Team
Availability	Toggle Dec 8 2025 → Subprocessor Jan 7 2026 → end March 2026	GA — macOS January 2026 · Windows February 2026
Update cadence	Microsoft cadence — historically slower	Anthropic-controlled — fast iteration
Geographic exclusion	Excluded: EU/EFTA/UK by default · GCC/DoD/sovereign	US-anchored; EU residency in beta

08

The bottom line.

Claude is one model and five products. The token price will not decide for you. Governance posture, feature velocity, residency, and existing cloud commit will.

Most enterprises end up with Patterns A or B (AWS-anchored) for production governance, supplemented by Pattern D (Anthropic-direct) for exploration and beta features. Channel selection should be driven first by operating model — with token economics used to validate the choice, not make it.

Pick the door for the room you actually want to live in.

Ready to map this against your estate?

This breakdown reflects the deployment options as of April 2026, verified against AWS, MS Learn, and Anthropic documentation. Re-validation runs quarterly — feature parity across hyperscaler channels moves on Anthropic's release cadence, not the cloud providers'. The next step is overlaying your residency requirements, existing cloud commits, M365 footprint, and target operating model against this framework to identify your channel mix.

Talk to Atish → Talk to Lan →

#01 · AI Architecture

The blueprint.
And the receipts.
Four ways in.

"AI Architecture" isn't a slogan — it's a nine-viewpoint, ISO/IEC/IEEE 42010-aligned reference architecture for intelligent agents that works on any cloud and any model. And it isn't theoretical: Costco is shipping it in production right now. Pick a door. Read the framework, read what it looks like when a real enterprise applies it end-to-end, see how it lands on each major platform — or read the security pattern that runs through all three.

4 deep-dives 9 architecture domains 1 client spotlight · Costco ISO 42010 aligned

Pick your door

Framework · Case study · Ecosystem · Security

01

Frameworks are easy to write. Hard to ship.

Every consulting firm has a reference architecture. Most live in PowerPoints that nobody reads twice. This one was different — because someone shipped it.

Behind Door A — The Blueprint — is the v7 Intelligent Agent Reference Architecture itself: nine domains, ISO/IEC/IEEE 42010 viewpoints, the agent-washing problem named, the 13 specification dimensions, eight archetypes, the integration protocols, the multi-agent topologies, and a deep-dive into the OWASP-aligned risk catalog.

Behind Door B — Costco Runs It — is the same framework applied to Costco's enterprise platform. Nexus architecture (core anchoring + satellite autonomy). GCP-first composable design. A 6-month MVP plan. A 5-year roadmap from MVP through strategic differentiation. Four priority use cases — Call Center, Personalized Search, Knowledge Assist, GEO — mapped to the same level-3 platform capabilities.

Behind Door C — Intelligent Digital Brain · Ecosystem — is the same framework translated onto each Major Agentic Platform: AWS, Azure, GCP, OpenAI on AWS, Databricks, and Snowflake. Service by service, layer by layer — and the nine universal gaps every platform leaves behind, with the partner stack that fills them.

Behind Door D — AI Security Architecture — is the security pattern that runs through all three: a four-zone enterprise stack (Channels → Agentic DMZ → Agentic Apps → Agentic Foundation) with the Agentic DMZ as the load-bearing security boundary, mapped to the same nine viewpoints from Door A. Not a layer to bolt on. A zone to architect around.

Read them in any order. The framework explains why; the spotlight shows what; the ecosystem map shows where; the security pattern shows how to keep it from blowing up. Together they cover the full distance from "we should build agents" to "this is what production looks like — on your platform, behind your boundary."

#01 · 1.a · The Reference Architecture · v7 · April 2026

"Logical architecture"
is too vague for AI.
Here's the blueprint that isn't.

The frameworks we inherited — 4+1 from 1995, C4 from the desktop era, the catch-all "logical architecture" of TOGAF and Zachman — were built before LLMs existed. They flatten data, models, cognition, security, and orchestration into a single hand-waving box. This is the alternative: nine domain-specific viewpoints, ISO/IEC/IEEE 42010-compliant, that name every component an intelligent agent system actually has — and let you build it on any cloud, with any model, without a rewrite.

9 architecture domains 234 source slides ISO 42010 aligned v7 · April 2026

Curator and Chief Author Dean Sauer · Accenture Center for Advanced AI · Source: AI Toolkit ✨ — Intelligent Agent Reference Architecture, v7, released 2026-04-22

01

The story begins with a misnomer.

Walk into any enterprise AI program and someone will ask for the "logical architecture." A box marked Agent Framework. A box marked Vector Database. A box marked LLM. Arrows. Everyone nods.

Then the system fails in production. Why? Because the boxes hid everything that mattered.

4+1 was created in 1995, the heart of the client-server era. LLMs did not exist. Apps were desktop and batch-oriented. C4, designed for evolutionary architecture in agile teams, never made data for models a first-class citizen — and has no place for model lifecycle, model monitoring, or observability.

AI systems aren't a logical-architecture problem. They're a multi-viewpoint problem. When an enterprise architect asks "where's your logical architecture?", the right answer is: "Our logical architecture is expressed through multiple viewpoints per ISO 42010 — data, runtime, cognitive, security, integration, infrastructure, model, DevMLOps, and multi-agent orchestration. Each is a first-class architectural viewpoint."

That's not hand-waving. That's the blueprint. Nine viewpoints. One reference architecture. Partner-neutral by construction.

02

The nine viewpoints, named.

Every intelligent agent system decomposes into nine complementary architecture domains. Skip one and you've shipped a prototype. Cover all nine and you've shipped a system. Each maps cleanly to an ISO/IEC/IEEE 42010 viewpoint — meaning your enterprise architect already has a vocabulary for it.

Fig 1. The nine domains and how they relate. Eight feed into and consume from the agent's core; the ninth — Multi-Agent Orchestration — wraps the whole system as the coordination/interaction viewpoint.

Domain 1 · Information Viewpoint

Data Architecture

Spans physical data storage, ingestion pipelines from numerous sources, transformation of data into knowledge, data for model training, and agent state and operations data.

Ingestion pipelines, embeddings, indices for semantic search
Graph data — nodes, edges, attributes
Interaction history, tool cache, FAQ cache, workflow state
Concerns: data flows, schemas, provenance, embeddings, feature lineage

Domain 2 · Information Viewpoint

Runtime Architecture

Reusable, standard implementations of common functions to applications. Structures application flow control and enables observability. Where ReAct lives. Where harnesses are built.

Orchestration — ReAct, RAG, prompt management, evaluation
Common services — prompt engineering, embedding generator, conversation history, FAQ cache, logging
Guardrails — PII/PHI masking, hallucination detector, context relevance, groundedness, answer relevance
Integration — agent discovery, tool discovery, tool creation, tool cache, tool execution, context engineering, prompt compression, user feedback

Domain 3 · Functional/Behavioral Viewpoint

Cognitive Architecture

The information processing mechanisms an intelligent agent uses to achieve its goals. Capabilities mapped to technologies, plus the information flow patterns that yield intelligent behavior.

Cognitive functional capabilities — Sense, Perceive, Learn, Plan, Create, Reason, Communicate, Act, Know
Information flow patterns — ReAct, RAG, reflex, OODA loops
Concerns: perception, planning, reasoning, action selection

Domain 4 · Security Viewpoint

Security Architecture

Identity and access management for users, agents, and agent tools. Plus data privacy and integrity, system availability, and harmful use by both users and agents.

IAM for users, agents, and tools — authentication, authorization, encryption, key management
Data privacy & integrity, system availability
Threats — prompt injection, excessive agency, vector and embedding weakness, supply chain
Concerns: trust boundaries, identity, access, privacy, compliance

Domain 5 · Interface/Connectivity Viewpoint

Integration Architecture

Protocols and standards for discovering and securely integrating agents and tools. The plumbing that lets agents call anything — and lets anything call agents.

MCP (Model Context Protocol) — Anthropic's open standard. Adopted by Claude Desktop, Zed, Replit, Codeium, Sourcegraph
A2A (Agent-to-Agent) — Google's open standard. Backed by 50+ companies including Atlassian, Cohere, Salesforce, PayPal
Commerce protocols — ACP (Stripe + BigCommerce), UCP (Google · 50B+ products), AP2 (Mastercard, PayPal)
Concerns: APIs, message passing, protocol compatibility, tool invocation

Domain 6 · Component/Algorithmic Viewpoint

Model Architecture

Model structure and size needed to power agent-specific cognitive functional capabilities. Not one model — a portfolio.

Native multimodal — Gemini, GPT, Claude, Grok (video at ~258+ tokens/frame, audio at ~32+ tokens/second)
LLMs — GPT, Claude, Llama, Mixtral, Gemini (decoder-only transformers, billions of parameters)
Bi-encoders for retrieval (SBERT, BGE, E5 · 384–1536 dimensions)
Cross-encoders for reranking (ms-marco-MiniLM, BGE-reranker, Cohere Rerank)
Prompt compressors (LLMLingua, AutoCompressor) and SLMs (Phi, Gemma · 1–10B parameters) for edge

Domain 7 · Development/Process Viewpoint

DevMLOps Architecture

Methods, tools, processes, and standards used to develop and operate agents and models. The lifecycle plumbing that keeps a fleet running.

DevOps for agent applications — CI/CD/CT pipelines, testing, monitoring
MLOps for models — training, deployment, evaluation, observability
Model gateway — access control, rate & budget limiting, model routing, observability dashboards, usage and cost guardrails
Evaluation suites — LLM eval, RAG eval, prompt eval; FinOps; observability

Domain 8 · Deployment/Resource Viewpoint

Infrastructure Architecture

Two stacks under one roof: traditional compute/storage/network for agent applications, plus specialized hardware for model training and inference.

Application tech stack — agent orchestration frameworks, vector and graph DBs with ingestion pipelines, data transformed into searchable knowledge
Model tech stack — web-scale training datasets, specialized training/inference software, GPUs and TPUs
Sensors and actuators for agents to interact with their environments

Domain 9 · Coordination/Interaction Viewpoint

Multi-Agent Orchestration

Agent team roles, tasks, delegation authority, inter-agent communication, workflow management, and governance. Where teams of specialists become a system.

Hierarchical Team — single manager coordinates supporting agents
Fully Connected Team — all agents communicate directly with each other
Team of Teams — manager coordinates a collection of teams, each with its own manager
Custom Workflow — partly deterministic, partly reasoned

03

The plot twist: most "agents" aren't.

There is a widespread "agent-washing" trend to label even simple services as "Agents." A form-validation service gets called a "Validator Agent." A logging service gets called a "Logging Agent." This linguistic inflation creates architectural confusion — and ships brittle systems.

An agent is an individual, goal-oriented system that is the source of its own action, with autonomous decision-making across multiple possible actions. When a validation service is called a "Validator Agent," the implication is that the service has autonomy, goals, and decision-making capability that it simply does not possess.

The deck's solution is a five-row taxonomy that names the distinction. Read this twice.

Component Type	Rule-of-thumb to recognize it	Example
Agent	If it decides which action to take from multiple options and then uses the results of the actions to select the next action — it's an Agent	ReAct Agent — Reads user query, uses search, calculator, and other tools in a loop to gather information, perform calculations, and create an answer
Workflow	If it deterministically processes inputs step-by-step (even if it uses cognitive capabilities) — it's a Workflow	Call Analysis Workflow — Convert audio recording speech to text, classify the intent, analyze sentiment, report results
Tool	If it is used by an agent to perform a specific task — it's a Tool	Web Search Tool — Searches for content relevant to a user query on the web
Runtime Architecture Service	If it performs a common service to an application — it's a Runtime Architecture Service	Logging Service — Records application events and errors with metadata to logs
Application Component	If it performs application-specific functionality — it's an Application Component	Tax Calculation Component — Calculates sales tax based upon location. Screens, reports, interfaces, business logic

Fig 2. All five conditions must hold. A thermostat — boundary, loop, multi-action space, setpoint goal, internal decision logic — qualifies. A form-validation service has none of them.

"Agentic AI" literally means "Agentic human-made Intelligent Agents." "AI Agents" literally means "Human-made Intelligent Agents Agents." Both are as redundant as saying "ATM Machine." Simply say AI or Intelligent Agent — and reserve agent for systems that actually have agency.

04

How to specify an intelligent agent — without hand-waving.

The deck names 13 dimensions required to actually specify an agent. Skip any of them and you're building on assumptions. The same 13 work for a thermostat, a conversational agent, a visual monitoring agent, an autonomous vehicle, or a humanoid robot — only the values change.

Dimension	What it answers	Example · Conversational Agent	Example · Monitoring Agent
Agent Archetype	The role the agent will play (developer, analyst, manager…)	Conversational Agent	Visual Monitoring Agent
Goals & Performance Measure	The goal the agent must achieve, and how success is measured	Support users by answering questions and performing tasks	Detect objects, faces, events in images, videos, or live camera feeds
Environment	Where the agent is designed to operate	Virtual on PC/mobile or physical at a kiosk	Anywhere a camera can capture light
Sensors	How the agent gathers information	Camera, microphone, touch screen, keyboard, mouse	Camera
Actuators	How the agent interacts with its environment	Screen, speakers, messaging system	Screen, messaging system
Cognitive Capabilities	The faculties needed to decide next action	Intent Classification, Memory, Speech to/from Text, Language Understanding & Generation	Visual Perception, Language Generation
Powering Technologies	The technologies that power those capabilities	Multi-Modal LLM, ASR Model, Speech Generation Model	CNN, LLM
Information Flow Pattern	How information flows through capabilities to select next action	ReAct — reasons what tools to use, invokes them, observes until complete	Reflex — perceives objects in image and generates report
Action Space	The set of actions the agent can perform	Communicate, Reason, Plan, Invoke Tools	Detect Objects, Send Alerts, Log Detections
Action Decision Engine	The mechanism by which the agent selects next action	LLM	If-Then Rules
Tools	External tools the agent needs	Flight booking, PTO lookup, search	None
Skills	Procedural knowledge for multi-step tasks	Book a flight and hotel	Not necessary for simple reflex agent
Team Membership	The team and collaborating roles	Call center team — agents and humans	Visual inspection team — agents and humans

05

From thermostat to humanoid robot.

Intelligent agents range in complexity from thermostats to humanoid robots. The same 13 specification dimensions describe all of them. The values change. The framework doesn't.

Dimension	Thermostat	Virtual Agent · Pre-LLM	Virtual Agent · Post-LLM	Autonomous Vehicle	Humanoid Robot
Goal	Maintain temperature	Answer simple questions, perform transactions	Answer complex questions, perform transactions	Transport passengers to destination	Perform physical tasks — assemble a product
Sensors	Thermometer	Digital messages, screen, speakers, camera, touch screen, microphone	Digital messages, screen, speakers, camera, touch screen, microphone	Cameras, sonar	Cameras, microphones
Actuators	AC and Heater switches	Digital messages, screen, speakers	Digital messages, screen, speakers	Steering, brake, accelerator	Hands, arms, legs, feet
Action Decision Engine	If-Then rules on temperature	If-Then rules on intent and entities	LLM using instructions, context, history, tool output	ML Models	ML Models
Russell-Norvig Class	Simple Reflex	Simple Reflex	Goal-Oriented, Learning	Utility, Learning	Utility, Learning

Fig 3. The break point between rules-based and LLM-driven sits between the pre-LLM and post-LLM virtual agents. Everything to the right of the break runs on probabilistic models — and inherits all the architectural complexity that follows.

06

Eight agent archetypes you'll actually build.

Similar to the roles humans play in an organization, agent designs fit into common patterns based on their goals, capabilities, and tools. Eight archetypes cover the vast majority of enterprise agents.

Manager

Coordinates Specialist Teams

Plans which agents perform which tasks
Reasons about agent outputs
Example: software product manager coordinating Software Engineer and QA agents

Conversational

Natural-Language User Support

Classifies user intent
RAG-based answering or scripted dialog
"Why was my bill so high?" / "What's the sick leave policy?"

Content Analyst

Document & Stream Analysis

Classification, sentiment, entity extraction, summarization
Call center analyst converting recording → text → intent → summary

Content Creator

Fact-Based or Creative Generation

Search tools + analysis + reasoning
Report generators, ad generators with images and personalized text

Data Analyst

Reasoning Through Queries

Generates SQL, Python, analytics code
Produces calculations and visualizations

Software Engineer

Code Generation & Review

Generates code meeting user-supplied requirements
Documentation, debugging, code execution, quality review

Monitoring

Stream & Inspection

Monitors text, images, videos, audio for specific content
Production-line defect detection, electrical equipment inspection

Computer User · RPA

Visual Perception + Mouse/Keyboard

Computer User: planning, reasoning, mouse and keystrokes — desktop browser shopping & purchases
RPA Robot: rule-based document classification, screen vision, multi-app data entry

07

ReAct, RAG, and the rise of "harness engineering."

The runtime patterns that ship most production agents are ReAct (reasoning + acting in a loop) and RAG (retrieval-augmented generation). Together they form the OODA loop of modern agent systems — and the orchestration code that wraps them has earned its own name.

Pattern · ReAct

The OODA loop in action.

The model is given a prompt that asks it to Reason / Think, describes available tools, and the model responds with the next Action. The orchestrator takes the action by invoking tools and provides their outputs as Observations in the next prompt. Loop until the LLM reasons it has enough information.

Iterate: Reason → Act → Observe → repeat
If no known tool exists, the orchestrator can invoke a search service or invent a tool and store it in the registry
Origin: Yao et al., ReAct: Synergizing Reasoning and Acting in Language Models, Oct 2022

Pattern · RAG

Retrieval-augmented generation, three phases.

Ingest unstructured data into a vector database. Retrieve via metadata + keyword + semantic search + reranking. Generate using prompt templates, history, and top relevant context.

Ingest: extract metadata · break into chunks · create embeddings via bi-encoder
Search: metadata filter · keyword (BM25) · semantic (embedding) · re-rank with cross-encoder
Generate: create prompt with user query + relevant context + history + instructions · LLM completion

Fig 4. ReAct is iterative — the loop only exits when the LLM concludes it has enough information. RAG is linear — each of the three phases enriches the context window before the model speaks.

08

Agents vs Workflows — the architectural decision that's not optional.

Agents are often confused with workflows. They aren't the same. Locus of control tells you which is which: in the agent, or in the orchestration engine.

Agent

Locus of control: in the agent.

Anything that perceives its environment through sensors and acts upon its environment through actuators — with goals, autonomy, and cognitive capabilities to decide which action to take next.

Autonomous decision-making, dynamic planning, goal-oriented reasoning
Iterative loops — observe, orient, decide, act
Adaptability: high — can change approach based on results, backtrack, try alternatives
Choose when: human-like reasoning is valuable, problems require creative problem-solving, multiple tools need dynamic coordination, outcomes > process consistency

Workflow

Locus of control: in the engine.

A structured sequence of predefined steps that transform inputs into outputs through deterministic operations — even if some of those steps use LLMs and ML models.

Deterministic execution, process-oriented, rule-based routing
Linear or branched pipeline execution
Adaptability: low — follows predefined paths
Choose when: process steps are well-defined, compliance is critical, high-volume repeatable operations, predictable performance, auditable execution

09

The integration layer is finally a real layer.

For decades, "AI integration" meant bespoke API wrappers per partner. 2025–2026 changed that. Two protocols emerged as the actual standards — one for tool use, one for agent-to-agent — plus a small zoo of commerce-specific protocols for the autonomous-purchasing era.

Protocol	What it standardizes	Owner / Backers
MCP (Model Context Protocol)	How AI models and agents connect to and interact with tools, APIs, data sources, and external resources. Client-server architecture for tools, resources, prompts.	Anthropic · adopted by Claude Desktop, Zed, Replit, Codeium, Sourcegraph
MCP Apps	First official MCP extension. Servers deliver HTML-based UIs (dashboards, forms, visualizations, workflows) that render in sandboxed iframes. Bidirectional via JSON-RPC over postMessage.	Supported by ChatGPT, Claude Desktop, Visual Studio Code, Goose
WebMCP	JavaScript library + W3C proposal letting websites expose client-side functionality as MCP-compatible tools agents can invoke directly in the browser. No backend required.	Currently in Chrome 146 Canary
A2A (Agent-to-Agent)	Application-level protocol for autonomous agents to discover capabilities (Agent Cards), negotiate modalities, manage long-running tasks, and exchange context.	Google · backed by 50+ companies including Atlassian, Cohere, Salesforce, PayPal
llms.txt	Markdown file at /llms.txt offering LLM-friendly site overview — like robots.txt and sitemap.xml. Companion /llms-full.txt for full flattened docs.	Auto-generated by Mintlify, Fern; supported by MCP servers for IDE integration
ACP (Agentic Commerce Protocol)	Agent-driven product discovery and checkout, with built-in tax, shipping, fraud protection via Shared Payment Tokens (SPTs).	Stripe + BigCommerce
UCP (Universal Commerce Protocol)	Lets AI agents facilitate purchases directly in AI Mode and Gemini app. Integrates with Google Shopping Graph (50B+ products).	Google · Shopify, Walmart, Etsy
AP2 (Agent Payments Protocol)	Payment-transaction layer for AI agents purchasing on behalf of consumers and merchants. Complements UCP.	Google · Mastercard, PayPal
OpenAPI (Swagger)	Industry-standard for describing RESTful APIs in machine-readable JSON/YAML. Widely used for LLM function calling — converts API definitions to tool schemas.	Compatible with OpenAI, Anthropic, others

10

Four ways to put agents on a team.

Multi-agent systems consist of specialized agents — each with their own goals, tasks, cognitive capabilities, and tools. How they communicate is an architectural choice, not a default. Four patterns cover the field.

Pattern A

Hierarchical Team

A single manager agent coordinates several supporting agents
Each team-member agent only communicates with the manager
Members do not talk to each other

Pattern B

Fully-Connected Team

All agents can communicate directly with each other
Each agent decides when to communicate and what to send
Most flexible — also the hardest to govern

Pattern C

Team of Teams

A manager agent coordinates a collection of teams
Each team has its own manager
Hierarchical at scale — the org-chart pattern

Pattern D

Custom Workflow

Each agent communicates with a subset of others
Some of the workflow is deterministic
Parts allow agents to reason and decide next actions

Fig 5. The four multi-agent topologies. Solid edges are deterministic; dashed edges in the custom workflow are points where an agent reasons about what to do next.

11

It's never just "the LLM."

Implementing intelligent agents involves integrating a portfolio of models — each performing specific functions, each with different inputs and outputs. The deck names six frequently-used types. If your architecture diagram has one box marked "LLM," it's wrong.

Model Type	Examples	Architecture	Key Functions
Native Multimodal	Gemini, GPT, Claude, Grok	End-to-end multimodal transformers built from the ground up to natively process video + audio + images + text + code simultaneously without separate fusion layers · video tokenized at ~258+ tokens/frame, audio at ~32+ tokens/second	Video understanding, audio transcription with speaker ID, multi-hour media analysis, cross-modal reasoning, multimodal agent orchestration
LLMs	GPT, Claude, Llama, Mixtral, Gemini	Decoder-only transformers (typically) with billions of parameters trained on massive text corpora using next-token prediction	Text generation, reasoning, tool calling, code generation, planning, memory management, orchestration logic for multi-agent systems
Bi-Encoders (Embedding)	SBERT, BGE, E5, Instructor, Nomic Embed	Dual transformer encoders that independently encode queries and documents into fixed-dimensional embeddings (384–1536 dimensions); similarity via dot product / cosine	Semantic search, document retrieval, RAG systems, fast approximate nearest-neighbor search, clustering
Cross-Encoders (Re-Rankers)	ms-marco-MiniLM, BGE-reranker, Cohere Rerank	Single transformer that jointly processes query-document pairs with [CLS] token for classification — relevance score (float, typically 0–1 or logit)	Reranking retrieved documents, precise relevance scoring, improving RAG precision after bi-encoder retrieval
Prompt Compressors	LLMLingua, LongLLMLingua, AutoCompressor, Selective Context	Token-level pruning models or learned compression transformers that identify and remove less informative tokens while preserving semantic content	Context window management, reducing API costs, handling long documents, improving latency, fitting more context within token limits
Small Language Models (SLMs)	Phi, Gemma	Compact decoder transformers (1–10B parameters) using knowledge distillation and high-quality training data	Edge deployment, fast inference, tool calling in latency-sensitive contexts, local agents, cost-effective repeated operations

12

The chapter that demanded its own page.

Security architecture in AI systems isn't a footnote — it's a category of its own, with risks that don't exist anywhere else in software engineering. Prompt injection. Excessive agency. Vector and embedding weakness. Unbounded consumption. The deck dedicates a full risk catalog mapped to OWASP Top 10. Click in for the full breakdown.

Coming Soon

The Specification Worksheet

The Intelligent Agent Tech Arch Specification worksheet — a record for each component across all nine domains. The deck embeds it as a downloadable. Future drop.

13

From blueprint to working architecture.

The deck names a seven-activity, two-phase process for moving from "we want to build agents" to "we have a future-state architecture and a roadmap." Use it to scope assessments, brief teams, and sequence work.

1

Assess Requirements & Current Capabilities

Understand AI Platform Requirements — identify business processes and tasks agents will automate; identify agent archetypes and their technical requirements. Deliverable: high-level requirements driving agent architecture.
2

Survey Current Architecture Assets

Interview technical resources to understand current-state agent architecture components in place. Deliverable: inventory of current architecture assets.
3

Identify Gaps & Opportunities

Given requirements and current state, identify gaps and opportunities for expansion to meet future requirements. Deliverable: architecture gap assessment.
4

Identify In-Scope Architecture Components

Create an inventory of architecture components needed to realize current and planned requirements. Deliverable: to-be agent architecture component inventory.
5

Recommend Tools, Patterns, Frameworks

For each in-scope architecture component, identify, assess, and select relevant products. Deliverable: to-be agent architecture specification.
6

Create Implementation Roadmap

Develop a roadmap for realizing the architecture — which may include implementing a proof-of-concept application. Deliverable: roadmap for architecture implementation.

14

The bottom line.

Build agent systems on a nine-viewpoint blueprint, not a one-box logical diagram. Reserve the word "agent" for things that actually have agency. Specify every agent across all 13 dimensions — goal, environment, sensors, actuators, capabilities, tools, action space, decision engine, team. Pick your runtime patterns intentionally — ReAct, RAG, harness — and your multi-agent topology — hierarchical, fully connected, team-of-teams, or custom — to match the work.

The point of partner-neutral architecture isn't theoretical purity. It's the option to lift-and-shift across clouds and models without rewriting your system. The framework is what survives the next partner cycle. The partners are what you swap out.

Pick any cloud. Pick any model. Lift-and-shift without a rewrite.

Ready to assess your agent architecture?

This page is a living summary of the v7 Intelligent Agent Reference Architecture, released 2026-04-22 by the Accenture Center for Advanced AI. Content is under active development — some sections are complete, others under construction. Expect gaps. Re-validate against the latest Toolkit GA release on the KX before scoping a new engagement.

Talk to Dean → Source deck · download ↓

#01 · 1.a · Risk Catalog · v7 · April 2026

⚠️ Every way
this can break.
And how to stop it.

Agent systems introduce a class of risks that don't exist anywhere else in software engineering — and most of them are now codified in the OWASP LLM Top 10 (2025 release). This is the catalog: 13 distinct risks across 5 categories, every one mapped to an OWASP entry where one exists, plus the controls and guardrails that mitigate each — slotted into the exact stage of the request → orchestration → LLM → output pipeline where they belong.

13 distinct risks 5 risk categories OWASP LLM Top 10 · 2025 5-stage control plane

Curator Dean Sauer · Source: Intelligent Agent Reference Architecture, v7 · Slides 147–151

01

Five categories. One pipeline. Thirteen ways it goes wrong.

Most security thinking inherited from web applications still applies — authentication, authorization, encryption, key management. But agents add five new risk categories: Confidentiality, Integrity, Availability, Harmfulness, Honesty. Each is sourced by a different actor — the user, the agent itself, the model, the system designer, or an external attacker. Each lands at a different stage of the pipeline. Each needs a different control.

What follows is the deck's full catalog, reproduced with every risk, every OWASP mapping, and every description.

02

⚠️ The risk catalog, part 1 — Confidentiality & Integrity.

Eight risks. Six map directly to the OWASP LLM Top 10 (2025); two are agent-specific extensions where OWASP does not yet have an entry.

Category	Source	Risk	OWASP	Description
1. Confidentiality	User	LLM02:2025 Sensitive Information Disclosure	Yes	LLMs expose sensitive data — PII, proprietary algorithms, confidential details — through their output. Includes credential leakage, business data disclosure, and IP exposure. When embedded in applications, LLMs can unintentionally reveal sensitive information, resulting in unauthorized data access, privacy violations, and legal/compliance issues.
1. Confidentiality	Agent	LLM06:2025 Excessive Agency	Yes	LLM systems have too much authority to call functions or interface with other systems, enabling damaging actions from unexpected or manipulated outputs. Root causes: excessive functionality, permissions, and autonomy granted to the LLM. Impact varies based on which systems the LLM application can interact with.
1. Confidentiality	Agent	Unauthorized Agent Use	Related to LLM06	An agent discovers another agent and delegates a task to it — but the requesting agent is not authorized.
1. Confidentiality	Agent	Unauthorized Data Access by Agent	Related to LLM06	An agent accesses data it is not authorized to access.
1. Confidentiality	Agent	Unauthorized Tool Use by Agent	Related to LLM06	An agent discovers and invokes a tool it is not authorized to use.
1. Confidentiality	System Design	LLM07:2025 System Prompt Leakage	Yes	Disclosure of system prompts or instructions that guide model behavior — which may contain sensitive information not intended to be discovered. The core risk isn't the prompt itself but the underlying sensitive data, guardrail details, or permission structures revealed. System prompts should never contain credentials or be used as security controls.
1. Confidentiality	System Design	LLM08:2025 Vector and Embedding Weaknesses	Yes	Affects systems using RAG with LLMs. Vulnerabilities in vector generation, storage, and retrieval can lead to unauthorized access, data leakage, cross-context information exposure, and embedding-inversion attacks. In multi-tenant environments, weaknesses can result in information leaks between users or contradictory knowledge retrieval.
2. Integrity	Attacker	LLM03:2025 Supply Chain	Yes	Vulnerabilities affecting the integrity of training data, models, and deployment platforms. Risks: third-party package vulnerabilities, compromised pre-trained models, weak model provenance. Newer fine-tuning methods like "LoRA" and on-device LLMs further increase attack surface.
2. Integrity	Attacker	LLM04:2025 Data and Model Poisoning	Yes	Training data is manipulated to introduce vulnerabilities, backdoors, or biases that compromise model security and behavior. Can degrade performance, generate toxic content, enable downstream system exploitation. Poisoning can target pre-training, fine-tuning, or embedding processes — risks especially high when using external data sources.

03

⚠️ The risk catalog, part 2 — Availability, Harmfulness & Honesty.

Five more risks plus seven harm-content sub-cases. Prompt injection is the most dangerous of these — it's the only one that can bypass nearly every other control if not caught at the input stage.

Category	Source	Risk	OWASP	Description
3. Availability	User	LLM10:2025 Unbounded Consumption	Yes	Excessive and uncontrolled inference operations leading to denial of service, financial losses, model theft, or performance degradation. Attack vectors: variable-length input flooding, denial-of-wallet attacks, continuous input overflow, resource-intensive queries. The high computational demands of LLMs make them particularly susceptible to resource exploitation.
3. Availability	User	Unbounded Task Steps	Related to LLM10	Agents and agent teams typically take multiple steps (observe, decide, act) to complete goals. The vulnerability is that the team — and individual agents — will continue acting yet never (or only after an exceedingly large number of steps) complete their goal.
4. Harmfulness	User	LLM01:2025 Prompt Injection	Yes	User prompts alter the LLM's behavior in unintended ways — potentially causing the model to violate guidelines, generate harmful content, enable unauthorized access, or influence critical decisions. Inputs can affect the model even if they are imperceptible to humans, making this particularly dangerous. Both direct and indirect prompt injections can lead to security breaches.
4. Harmfulness	LLM	LLM05:2025 Improper Output Handling	Yes	Insufficient validation, sanitization, and handling of LLM outputs before passing to other systems. Since LLM outputs can be controlled by prompt input, this creates risks similar to giving users indirect access to additional functionality. Successful exploitation can result in XSS, CSRF, privilege escalation, or remote code execution.
4. Harmfulness	LLM	Biased Content Generation	Related to LLM05	Model generates biased content.
4. Harmfulness	LLM	Hate Speech Generation	Related to LLM05	Model generates hate speech.
4. Harmfulness	LLM	Insult Generation	Related to LLM05	Model generates insults.
4. Harmfulness	LLM	Sexual Content Generation	Related to LLM05	Model generates sexual content.
4. Harmfulness	LLM	Violent Content Generation	Related to LLM05	Model generates violent content.
4. Harmfulness	LLM	Misconduct Suggestion	Related to LLM05	Model suggests misconduct.
5. Honesty	LLM	LLM09:2025 Misinformation	Yes	LLMs produce false or misleading information that appears credible, with hallucination being a major cause. Compounded by user overreliance — excessive trust in LLM outputs without verification. Risks: factual inaccuracies, unsupported claims, misrepresentation of expertise, generation of unsafe code.

04

The five-stage control plane.

LLM-powered applications present unique risks that can be mitigated by implementing controls at each stage of processing: request, tool/data access, model consumption, agent action, and model output. The deck slots every guardrail into exactly one of these five stages.

Stage 1

Input Guardrails — at the prompt.

Catch the malicious request before it touches the model. The single highest-leverage stage in the pipeline.

Malicious Use: jailbreak detection, prompt injection detection
Data Privacy: PII / PHI masking

Stage 2

Orchestration — at tool & data access.

The agent reasons about what to do next. Check what it's allowed to do before it does it.

Data Privacy: data access check
System Access: tool access check
Agent Convergence: ReAct loop limits

Stage 3

LLM Invocation — at model consumption.

Where token spend, latency, and budgets get enforced. Where rogue agents become incidents.

LLM Guidance: meta-prompt
Financial: budget limits
System Performance: usage limits

Stage 4

Output Guardrails — at the response.

After the model speaks, before the user (or downstream system) consumes. The last line of defense.

Harmful Content Detection: biased, hate speech, insulting, sexual, violent content; misconduct suggestion
Harmful Code Detection: generated code analysis
Honesty: hallucination detection

Fig 6. Five stages, six OWASP-mapped threats. Each guardrail has its preferred stage but every later stage has the chance to catch what slipped through earlier.

No single control is enough. The point isn't to pick one stage — it's to defend at every stage simultaneously. Prompt injection bypasses Stage 4 if you didn't catch it at Stage 1. Excessive agency can't be undone at the output if it already wired the agent to a system it shouldn't have reached. Defense in depth is not a slogan here. It's the architecture.

05

The risk management process.

AI risk management starts with a comprehensive assessment of AI risks across the enterprise. Controls then need to be implemented to mitigate the risks. Risk management resources continuously monitor risk metrics and address issues. Three activities. One ongoing loop.

1

Assess Risks of AI Applications

Create the AI risks catalog. Define risk KPIs. Assess each application against the catalog. The same catalog reproduced above is the starting point.
2

Plan Risk Mitigation

Define controls for each AI risk. Match every entry in the catalog to one or more guardrails in the five-stage control plane. Document the mapping.
3

Monitor & Address

Continuously monitor risk KPIs. Address issues as they emerge. Re-assess on cadence. This isn't a project. It's an operating discipline.

06

The bottom line.

Agent and model security is its own discipline. 13 distinct risks. 5 categories. 6 OWASP LLM Top 10 mappings. One control plane spanning input, orchestration, model, output, and usage stages. Treat it as a first-class architecture domain — because per ISO 42010, that's exactly what it is.

Ready to map this against your applications?

The catalog above is the input to a real risk register. The next step is overlaying your in-flight and proposed agent applications against it, scoring each on likelihood and impact, and assigning controls from the five-stage plane to mitigate. Additional context: A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures (arXiv:2506.19676).

Talk to Dean → Source deck · download ↓

#01 · 1.b · Costco · Client Spotlight · February 2026

Costco
runs
the blueprint.

Most reference architectures live in PowerPoints. This one runs in production. Costco set out to build an enterprise agentic AI platform on the same nine-viewpoint blueprint described in 1.a — and turned it into a Nexus architecture (core anchoring + satellite autonomy), a GCP-first composable stack, a 6-month MVP, a 5-year roadmap through FY31, and four priority use cases. This is what the framework looks like with receipts.

Nexus architecture 6 months MVP plan 5 years · FY26 → FY31 4 priority use cases

Source Enterprise Agentic AI Platform Architecture Blueprint · Internal Accenture deliverable for Costco · 162 slides · February 2026

01

The blueprint, with receipts.

In 1.a we argued that "logical architecture" is too vague for AI — that nine domain-specific, ISO 42010-aligned viewpoints are the actual answer. 1.b is what happens when an enterprise actually does it.

Costco is one of the world's largest retailers. Their challenge wasn't "should we use AI." It was "how do we build an enterprise platform that lets every team build agents — without each one reinventing data, models, governance, security, and operations."

The deliverable: a target-state Enterprise Agentic AI Platform Architecture Blueprint covering guiding principles, the Nexus architecture, the full capability stack, layer-by-layer technology decisions, MVP scoping, a 5-year roadmap, and architecture mappings for the four priority use cases — Call Center, Personalized Search, Knowledge Assist, and GEO (Generative Engine Optimization).

Every choice you'll see below was made against the same nine-domain blueprint from 1.a. The framework gave them the structure; their context (Fortune-15 scale, GCP-first posture, regulated workloads, knowledge-heavy use cases) drove the specifics.

02

First, the guiding principles.

Before naming a single technology, the team named what they wanted to be. Two layers: enterprise-wide architecture principles inherited from Costco's existing EA practice, and AI-specific principles layered on top.

Enterprise Architecture · 10 principles

The Costco baseline.

The principles every enterprise initiative inherits — including agentic AI.

Business and IT Alignment with measurable value
Customer-Centric Design
Security, Compliance, and Privacy by Design
Simplicity and Scalability
Modular and API-Driven Architecture
Reuse Over Build or Buy
Global Availability and Resilience
Data-Driven Decision Making
Adaptive Governance
Innovation and Continuous Improvement & Automation

AI Architecture · 6 principles

The agentic-AI overlay.

What changes when you put intelligent agents on top.

Lead from the Top
Responsible Development & Deployment
Composable AI Architecture with GCP First
Interoperability
Empower the Workforce
Partner for Acceleration

03

Seven design principles before any technology.

The platform's design principles operate at a higher altitude than tools. Get these right and the technology choices fall out almost mechanically.

Principle 1

Knowledge-first context engineering

Semantic data modeling
Context isolation
High-quality data preparation and normalization
Continuous knowledge governance and lifecycle management

Principle 2

Federated deployment, centralized governance

Domain autonomy, platform consistency
Shared reference architecture with local extensions
Common guardrails enforced through a central policy
Unified agent registry and identity
Automated deployment tooling

Principle 3

Standards-driven, governance by design

Standardized agent lifecycle management and certification
Governance embedded into workflows
Standardized interfaces and protocols
Global safety and risk framework
Unified observability, telemetry, auditability

Principle 4

Composable design for rapid innovation

Service-oriented design approach
Loose coupling via abstractions
Declarative orchestration

Principle 5

Elasticity for high-volume processing

Elastic, on-demand orchestration
Resilient and fault-tolerant execution
Inferencing through request batching
High throughput, low latency

Principle 6

High-performance, safety-first agent ops

Standardized red teaming and AI judge framework
Tunable agent reasoning levels based on task complexity
Defensive UI for agentic experience
Network isolation

Principle 7

Cost efficient by design

Right-sized models and adaptive routing
FinOps by design — cost visibility and guardrails
Operational simplification through platform consolidation
Semantic caching

04

The plot twist: Nexus architecture.

The single biggest architectural decision in the deck isn't which model, which database, or which cloud. It's this: agents will be federated in the organization. Centralized in some places. Distributed in others. The trick is knowing which is which.

The Core

Anchored capabilities. Built once.

The core represents the solutions developed as foundational and differentiated capabilities of the organization. Built and operated centrally — because consistency is the moat.

The knowledge layer — a shared organizational substrate
Utility agents — pre-built, certified, reusable
Centralized governance spanning custom and commodity agents
AI operations — the control plane for the whole estate

The Satellites

Autonomous capabilities. Bought, not built.

Satellites represent the non-differentiated or commodity agentic capabilities developed by ecosystem products — for faster time to market. Agents stay close to the data, process, and experience affinity.

Salesforce, SAP, ServiceNow agents — "agents as a service"
Each satellite owns its own domain
The core enables centralized governance for both differentiated custom and commodity agents

Fig 7. The Nexus topology. Differentiated capabilities live in the core. Commodity agentic capabilities (Salesforce, SAP, ServiceNow) ride as satellites — close to the data they already serve, brokered through MCP, governed centrally.

The protocol stance — A2A primary, MCP as the bridge

Within the platform, Costco will leverage the Sub-Agent or Agent-as-Tool pattern for inter-agent communication. For 3rd-party agents, they leverage MCP and the Agent-as-Tool pattern. A2A and other emerging patterns will be explored based on applicability and maturity.

How A2A works: remote agents publish an Agent Card describing their capabilities, skills, and authentication. Communication is oriented towards Task completion. Tasks transmit Artifacts (results) and Messages (thoughts, instructions). Agents accomplish tasks for end-users without sharing memory, thoughts, or tools.

How MCP fits: 3rd-party agents that aren't A2A-compliant get exposed as tools. A2A-compliant agents connect to those agents through MCP. MCP also connects agents to tools, resources, and prompts via a standard client-server architecture.

The point of Nexus is sovereignty over what's differentiated and speed over what isn't. Build the knowledge layer and governance once, in the core. Buy commodity agents from partners and let them live close to the data they already serve. One central control plane. Many federated executors.

05

The capability stack — five layers, top to bottom.

Costco's enterprise agentic AI platform decomposes into five capability layers. Each is an architectural concern, with its own ownership, technology decisions, and governance posture.

Layer 1

AI Strategy & Tech Business Mgmt

Establishes AI technology strategy and standards
Governs investments and organizational behavior
Ensures alignment with priorities and responsible behaviors

Layer 2

AI Platform

Shared enterprise platform supplying core orchestration, tooling, integration
Technical governance and operational services for AI agents
The substrate every team builds on

Layer 3

Data & Analytics

Delivers and governs high-quality, trusted data to power AI agents
Provides data and analytics capabilities that inform AI strategy
Continuously shapes the enterprise direction

Layer 4

Solution Delivery & Management

Designs, delivers, and manages AI use cases end-to-end
Ensures solutions are built, deployed, and continuously improved
Delivers measurable business value

Layer 5

Infrastructure, Operations & Security

Resilient, secure, and optimized cloud and infrastructure services
Continuously runs AI solutions responsibly across the enterprise
The runway under everything

Fig 8. The five layers. Strategy at the top sets direction; the platform layer is the substrate every team builds on; data feeds it; solutions ride on it; infrastructure runs underneath. The two purple-highlighted layers are the ones the deck treats as bookends.

06

Now the parts list — Level 3 capabilities, by domain.

Drilling down: the platform's seven internal domains and the specific Level 3 capabilities each one ships. An asterisk-marked existing capability means it lives in Costco's estate today and will need enhancements during use-case enablement.

Domain	Level 3 capabilities
Cloud & Infrastructure	Server / Container (Agent Run Time) · Cost Control (Tagging, Budgets, Alerts) · Observability (Telemetry) · Identity & Access Management · Network Management (VPC, Subnets, Routes) · API Management · Standards & Policy Management (NIST Controls) · Vulnerability Management
Data	Enterprise Data Governance (Data Catalog) · Analytical Data Stores · Operational Data Stores (Near & Real-time Data Products) · Object Storage · Data Security Management (Masking, Encryption) · Data Integration Management (ETL, Pub/Sub, CDC)
Knowledge	Knowledge Ingestion (semi-structured, unstructured, structured) · Knowledge Retrieval (retrieval strategies) · Knowledge Synthesis (synthetic data generation) · Metadata Management · Taxonomy Management · Ontology Management · Knowledge Graph (KG creation using taxonomy + ontology) · Vector Stores (embedding persistence)
Model	Model Registry (approved foundation models) · Model Fine-Tuning (domain adaptation) · Model Benchmarking (right-fit per use case) · Model Security (Guardrails, Content Filter)
Agent	Agent Orchestration (no-code, low-code, pro-code) · Agent Explainability (activity & token consumption tracking) · Agent Memory · Prompt Registry · Agent Tools and Protocols (MCP, A2A, connectors)
Agent Governance	Agent Certification (1P + 3P) · Agent Evaluation (LLM as judge, statistical metrics) · Human Feedback (thumbs up/down, structured) · Agent Security (identity, data, tool controls) · Agent Registry (1P + 3P)
AI Operations	Experimentation (build, deploy, monitor) · Serving (ML inferencing) · Agent Deploy · Agent Observability (Model, Application, Agent, Prompts, Ingestion pipelines) · Agent Improvement (feedback integration) · AI Gateway (Model, Agent) · MCP Gateway (Tools)

07

The technology decisions — GCP first, but not GCP only.

"Composable AI Architecture with GCP First" is a guiding principle, not a religion. Where GCP-native fits, use it. Where it doesn't, build or buy. Below: the layer-by-layer decisions reproduced from the deck — exactly as scoped — across GCP services, non-GCP services, and 3rd-party services.

Knowledge Layer · Technology Decisions (1/5)

Capability	What it does	GCP Services	Non-GCP / 3rd Party
Knowledge Ingestion	Scalable processing of semi-structured, unstructured, and structured enterprise data (documents, images, audio, video, relational). Modules for entity/metadata extraction, classification tags, chunking for embeddings, enrichment for downstream retrieval.	Gemini Enterprise · Vector Search · Alloy DB · Cloud Run	None
Knowledge Retrieval	Optimizes the search space by combining retrieval/reranking strategies to identify the most optimal and relevant context to pass to the language model.	Gemini Enterprise · GKE · Redis Cache / Cloud Memorystore	None
Knowledge Synthesis	Services for generating, validating, and integrating synthetic data to support prompt tuning, scenario generation, and evaluation. Provides broad coverage of diverse data types including edge-case and safety scenarios.	None	Python · RAGAS / DeepEvals
Metadata Management	Defines, organizes, and governs metadata across knowledge assets. Covers data access rules, categories, timestamps, lineage, quality attributes. Enables retrieval filtering and context isolation via high-precision descriptors.	Dataplex	None
Taxonomy Management	Structured classification system that organizes knowledge into categories, hierarchies, and relationships. Creates a consistent vocabulary that humans and AI models can interpret reliably.	Dataplex	UI and Backend
Ontology Management	Semantic representation of the business domain capturing entities, attributes, relationships, constraints, and interactions. Provides LLMs and agents with structural understanding to improve grounding and reasoning.	Alloy DB · Firestore (optional)	UI and Backend
Knowledge Graph	Dynamic representation of knowledge that models concepts within a particular domain and the relationships between them. The digital brain of the AI agent.	None	Neo4j · UI and Backend
Vector Store	Specialized databases for storing and searching high-dimensional numerical representations of data, enabling AI systems to find semantically similar items.	Alloy DB	None

Model Layer · Technology Decisions (2/5)

Capability	What it does	GCP Services	Non-GCP / 3rd Party
Model Registry	Set of approved models from different providers, exposed via the AI gateway. Provides scoped access to approved models.	APIGEE	Kong · LiteLLM
Model Security	Mechanisms to enforce safety constraints, prohibited topics, refusal behavior, and output filtering at the model level.	Model Armor	None
Model Benchmarking	Suite for testing and evaluating base models and custom models against well-defined metrics; creates benchmarks for business-related functional areas.	Vertex AI Evaluation Service	Front-end and back-end service
Model Fine-Tuning	Capability to train or adapt foundation models with domain-specific Costco data so the model internalizes the vocabulary, semantics, and constraints of the problem space.	Vertex AI Fine Tuning	None

Agent Layer · Technology Decisions (3/5)

Capability	What it does	GCP Services	Non-GCP / 3rd Party
Agent Orchestration	Highly customizable, low-code and pro-code, scalable framework with chain-of-thought reasoning, dynamic task decomposition and management. Agents collaborate via integrated memory; multi-agent collaboration via a 3-layer orchestrator/super/utility agent topology.	Vertex AI Agent Engine · Google ADK	None
Agent Tools and Protocols	Pre-built services that allow agents to integrate securely to enterprise data and systems (CRM, ERP, ITSM, etc.).	APIGEE	Kong · LiteLLM
Agent Memory	Secure, governed, persistent layer that lets agents store specific episodes of interactions for later retrieval — so they can learn from past interactions. Stores key facts, preferences, actions, and outcomes across semantic, episodic, and entity dimensions. All options will be available through a memory abstraction.	Vertex AI Agent Engine Memory Bank	Langmem · mem0
Agent Explainability	Continuous stream of spans and traces capturing agent interactions, prompts, tool usage, latency, cost, errors, and action outcomes — providing observability into agent execution.	Cloud Trace · Cloud Logging · Cloud Monitoring	None specified
Prompt Registry	Centralized, version-controlled catalog where all prompt templates are managed and stored. Treats prompts as first-class artifacts — reviewed, tested, tagged, versioned. Single source of truth.	Vertex AI Prompt Management	GitHub / CICD

Agent Governance · Technology Decisions (4/5)

Capability	What it does	GCP Services	Non-GCP / 3rd Party
Agent Certification	Process of assessing agents against capability maturity and readiness dimensions. Capability maturity defines the autonomy/agency level; readiness is measured by security, effectiveness, and interoperability aspects.	None	Custom Developed (Python + REACT)
Agent Evaluation	Measurement systems that evaluate how well an agent reasons, retrieves, and acts. Ensures continuous reliability and tracks drift over time.	Vertex AI Evaluation	RAGAS · Trulens
Human Feedback	Structured human-in-the-loop (HITL) mechanism guiding agent behavior toward safe, aligned outcomes. Human input — approvals, feedback, corrections, reinforcement.	None	Custom Developed (Python + REACT)
Agent Registry	System of record for all certified agents — capturing identity, owner, purpose, versions, allowed tools/data, policy constraints. Each agent documented through an A2A-compliant Agent Card.	None	Custom Developed (Python + REACT)
Agent Security	Treats every AI agent like a non-human entity with strong control over what it can access and do. Each agent has a unique, verifiable identity used for authentication, authorization, and full audit logging of actions and tool calls.	Vertex AI Agent Engine Identity	None

AI Operations · Technology Decisions (5/5)

Capability	What it does	GCP Services	Non-GCP / 3rd Party
AI Gateway	Centralized control plane between agent applications, model providers, and MCP servers. Enforces governance and operations at runtime — auth, rate limits, policy checks, logging/tracing, spend/budget controls. Standardizes access; enables semantic caching and usage analytics.	APIGEE	Kong · LiteLLM
MCP Gateway	Control plane / proxy layer managing how agents securely access tools, data, and resources through MCP servers. Acts as the policy-enforcing middle layer — validating requests, brokering capabilities, ensuring every tool invocation follows enterprise rules around safety, observability, authorization.	APIGEE	Kong · LiteLLM
Agent Deploy	Enhances traditional DevOps with checks unique to agentic systems — prompt scanning, MCP tool scanning in the pipeline.	(Assessment in flight)	GHEC · GitHub Actions
Agent Observability	Collects, analyzes, and observes how agents behave in production. Captures end-to-end telemetry across agent runs, model calls, tool interactions — latency, errors, quality signals, cost.	GCS · AlloyDB	Arize · Dynatrace · Grafana
Agent Improvement	Continuous cycle of making agents more accurate, safe, cost-efficient based on real production signals. Uses evaluations and human feedback to facilitate reinforcement learning for continuous improvement.	Vertex AI Fine-tuning	None

08

MVP scoping — three sizes. Pick one.

Costco's deck offers three MVP scoping options, each strictly additive: small is foundational; medium adds utility agents and an agent appraisal framework; large adds prompt analytics and a knowledge graph builder. Increasing in scope as you move to the right.

Option · MVP-Small

No-regret foundational capabilities.

The baseline platform. Eight deliverables. Everything below is required regardless of which path Costco picks.

POC validation for APIGEE, Aura DB, Dataplex, and Dynatrace integration for operational metrics
Cloud & Infrastructure foundation — GCP onboarding, IAM foundation, IaC, containers
Platform foundation services — Alloy DB, Agent Engine, Aura DB
Knowledge Layer — Data-to-Knowledge patterns for RAG-based use cases
Approved language models configured in AI Gateway (APIGEE) for governed access
Agent governance — human feedback collection, operational metrics, Dynatrace integration
Semantic Memory as a Service — for consistency and cost reduction
Knowledge Serving Layer (Hybrid Search and Semantic Search)

Option · MVP-Medium

Utility agents + agent appraisal.

MVP-Small + 5 deliverables. Adds the first wave of platform-supplied agents and a real evaluation framework.

Knowledge layer enhancement — taxonomy management in Dataplex; manual intent-graph build (without Knowledge Graph Studio)
Intent Graph model created for Knowledge Assist
Knowledge Serving Layer enhancement to serve the intent graph
Knowledge Assist (utility agent) and Intent Resolver (utility agent)
Agentic AI Evaluation Framework + Agent Appraisal Dashboard

Option · MVP-Large

Prompt analytics + KG builder.

MVP-Medium + 2 deliverables. The fully scoped platform launch.

Prompt Analytics Dashboard — track and monitor interaction patterns; insight for performance and security improvement
Knowledge Graph Builder Service — manage and maintain domain graphs leveraging ontology and taxonomy

Fig 9. The MVP options are strictly nested — Large contains Medium, Medium contains Small. The 6-month timeline maps each scope to the months it lands in, with the Pharmacy FAQ on Knowledge Assist as the named delivery milestone.

The 6-month MVP roadmap

Costco's deck phases the MVP across six months. Month 1: Cloud Foundation Setup · Conceptual SA, KAD, ADR complete · Architecture. Month 2: Platform Foundation Setup · POC Execution · non-prod platform setup · Model Armor template + approved models configured. Month 3: POCs complete · AI Platform Foundation complete · Data-to-Knowledge as a service (ingestion + retrieval) · Semantic Memory · Human Feedback Service. Month 4: Knowledge-to-Data (without KG builder) · Use case onboarding · Crawl for GEO use case · Operational Metrics integrated with Dynatrace. Month 5: Knowledge Assist + Intent Agent (utility agents) · Evaluation Framework · Observability for operational and agent appraisal. Month 6: Knowledge Graph Builder · Prompt Analytics · Agent Appraisal Dashboard · Platform testing & documentation. By month 6: Knowledge Assist framework ready with Pharmacy FAQ.

09

The 5-year roadmap — FY26 through FY31.

MVP gets you to month 6. The deck looks five years out. Three macro phases: MVP build, platform maturity / operational excellence, and strategic differentiation. Agentic capabilities with repeatable patterns go into the platform — not into individual use cases.

Fig 10. The 5-year program isn't sequential — it's overlapping. Maturity work begins in mid-FY27 while MVP wraps; strategic differentiation begins in FY29 while maturity continues. The lower row names the three flagship strategic outcomes by FY31.

1

FY26 · MVP Build 1.0 + 2.0

Q3 FY26 — Q2 FY27. Cloud Foundation setup for Agentic AI · KAD, POCs, and Testing · Knowledge Ingestion (D2K pipeline) · Knowledge Retrieval (Semantic, Hybrid, Graph RAG) · Vector Stores · Model Registry Setup · Model Security (Model Armor) · MCP Gateway setup · AI Gateway Setup · Agent Deploy · Agent Memory · Prompt Registry · Pre-built Utility Agents · Agent Pattern Catalog · Agent Certification (Process) · Agent Explainability / Observability · Agent Evaluation · AI Gateway / Observability / Graph DB · Certify D2K with Knowledge Assist · Platform Testing (Pen Testing, Vulnerability Testing).
2

FY27–FY29 · Platform Maturity, Operational Excellence

Metadata Management · Knowledge Graph Builder · Knowledge Operations · Taxonomy and Ontology Management · Adaptive Learning Framework · Agent Improvement (RL Models, Cross-Encoder re-rankers) · Model Benchmarking · Agent Certification (Implement) · Agent Registry · Agent Security · Agent Onboarding · AI Gateway Setup Enhancements (A2A integration, integrate with OpenAI, Anthropic) · Fine-tuning workbench · AI for BI · Knowledge Assist · Chargeback model · Platform consumption tracking.
3

FY29–FY31 · Strategic Differentiation

POCs for strategic differentiation: Agent Commerce, Agent Marketplace, Agent Economy · Mind of Costco Ecosystem (Organizational Knowledge Graph, Organization Memory Graph) · Autonomous agents — Controlled Autonomy (continuous environmental sensing + IT/business operation actions) · Publishing Costco-specific agents for external marketplace integration (e.g., GEO, instant checkout from ChatGPT) · Agent Commerce (UCP). Plus ongoing platform operations, maintenance, and enhancements (Vector Store, Agent Engine provisioning).

10

Four priority use cases — same framework, different surfaces.

The platform exists to enable use cases — not the other way around. The deck names four priority workloads, each mapped to the same Level 3 capability matrix. Same scaffolding, four different agents on top.

Use Case 1

Call Center · Contact Center

Conversational + content-analyst archetypes
Mapped to the full MVP capability matrix — Level 3 capabilities across Cloud, Data, Knowledge, Model, Agent, Governance, Operations
Inherits the platform's identity-management, data and tool controls, MCP / A2A connectors

Use Case 2

Personalized Search

Knowledge-heavy retrieval with member-context personalization
Leverages Knowledge Layer (D2K + K2D), retrieval strategies, vector + graph stores
Routes to AI Gateway with model-tier selection by query complexity

Use Case 3

Knowledge Assist

Utility agent — synthesizes and contextualizes trusted enterprise knowledge for users
By month 6 of MVP: framework ready with Pharmacy FAQ
Foundation for the Intent Resolver utility agent and the Knowledge Graph Builder service

Use Case 4

GEO · Generative Engine Optimization

Crawl phase begins month 4 of MVP
FY29–FY31: publishing Costco-specific agents for external marketplace integration — including instant checkout from ChatGPT
Agent Commerce on the Universal Commerce Protocol (UCP) joins the program in the strategic differentiation phase

11

The bottom line.

Costco didn't build a use case. They built a platform — and use cases ride on top. Nexus architecture for sovereignty over what's differentiated and speed over what isn't. GCP-first composable design for partner leverage without lock-in. Five capability layers, seven internal domains, dozens of L3 capabilities. A 6-month MVP that proves the foundation. A 5-year roadmap that extends from foundational onboarding to autonomous agents and Costco-specific marketplace integration.

Most importantly: every architectural choice traces back to one of the seven design principles — knowledge-first context engineering, federated deployment with centralized governance, standards-driven by design, composable for rapid innovation, elastic for high-volume processing, safety-first ops, and cost-efficient by design.

If 1.a tells you why the framework matters, 1.b shows you what it looks like in production.

Want the framework behind this?

The architectural decisions on this page weren't invented for Costco — they were the deliberate application of the v7 Intelligent Agent Reference Architecture from 1.a. Open the blueprint to see the nine-domain, ISO 42010-aligned framework that informed every choice above. Or jump straight to the OWASP-aligned risk catalog deep-dive that's now part of the standard pre-flight check.

Source deck · download ↓

#01 · 1.c · Intelligent Digital Brain · Ecosystem · February 2026

One brain.
Six platforms.
The same nine gaps.

The blueprint in 1.a tells you what to build. The Costco spotlight in 1.b shows you how a Fortune-15 enterprise actually built it. 1.c shows you what it looks like on each Major Agentic Platform — AWS, Azure, GCP, OpenAI on AWS, Databricks, and Snowflake — service by service, layer by layer. And it shows you something more uncomfortable: every platform leaves the same handful of gaps. Knowing where the natives stop is the difference between a brain that ships and a brain that stalls.

6 platforms mapped L2 architecture depth 7 orchestration steps 9 universal gaps

Curator Atish Ray · Chief AI Architect

01

The platform decision is not the architecture decision.

Almost every enterprise agentic AI conversation begins with the same question: "Should we build on AWS, Azure, GCP, Databricks, or Snowflake?" It's the wrong opening question — but the right one to disarm.

The right opening question is: "What does an Intelligent Digital Brain actually look like?" Once you know that, the platform question stops being a religious war and becomes a translation exercise.

Same brain. Different services. Different gaps. The brain has the same 23 capabilities on every platform — agent orchestration, semantic layer, model recipe, governance, observability, and so on. What changes from one platform to the next is which native services map to which capability — and, critically, where the platform's natives run out.

[Image Suggestion: A hex-grid of six logos (AWS · Azure · GCP · OpenAI on AWS · Databricks · Snowflake), each connected by purple threads to a single luminous "Brain" node in the center. Subtle ghosted text below: "Same blueprint, different translations."]

02

First, what the brain actually is.

Before you can map a brain onto a platform, you need to agree on the brain. The L2 reference is a layered architecture organized around seven steps of agentic execution — the loop every enterprise agent runs, regardless of cloud:

The seven-step agentic execution flow (L2)

1

Orchestrate · agents coordinate domain requests

2

Gateway · models, tools, knowledge as control points

3

Reason · ensemble of continuously-learning models

4

Ground · semantic layer + ontology + data products

5

Act · update data products as agents make changes

6

Integrate · enterprise systems with embedded agents

7

Govern · controls + visibility logs at every stage

Underneath, the brain is organized into five enterprise layers — Industry Pattern Libraries · AI Lifecycle Management · Agent Ensemble · Domain Ontologies + Specialized Models · Data Foundation — sitting on a shared Brain Infrastructure of compute, networking, identity, secrets, and resilience.

That's the constant. The platform choice determines the spelling — which native services play which roles — but not the structure.

03

Six platforms, mapped.

For each platform we draw the same L2 picture, then label every capability with the native services that fulfill it. What follows is a quick-reference card per platform — the headline services that do show up natively, and the platform's distinctive flavor.

Platform 1 · AWS

The deepest service catalog.

Strong across orchestration, model recipe, data foundation, and infrastructure. The brain plumbing is mostly already there.

Orchestration: Step Functions, EventBridge, Bedrock Multi-Agent, Agent Core
Models: Bedrock, SageMaker, Bedrock Knowledgebase, OpenSearch
Data: S3, Glue, Lake Formation, Neptune, Redshift, DynamoDB, Aurora, Athena
Govern + observe: Bedrock Guardrails, Clarify, A2I, CloudWatch, X-Ray, IAM, KMS, Organizations + SCP

Platform 2 · Microsoft Azure

Foundry as the backbone.

Microsoft Foundry plus Azure OpenAI Service form the spine; Semantic Kernel and AutoGen carry agent orchestration.

Orchestration: Microsoft Agent Framework SDK, Foundry workflows, Semantic Kernel/AutoGen
Models: Azure OpenAI Service, Azure Machine Learning, Foundry IQ for grounding
Data: Azure Synapse, Data Factory, Cosmos DB (Gremlin), Data Lake Storage Gen2, Purview
Govern + observe: Foundry evaluations, Azure Content Safety, Responsible AI Toolbox, Azure Monitor, Application Insights, Azure Red Teaming Agent

Platform 3 · Google Cloud

Vertex everywhere, ADK for agents.

Vertex AI, Agent Builder, and the Agent Development Kit (ADK) form the agentic substrate; Gemini provides cognition.

Orchestration: Vertex AI Agent Builder, ADK, Vertex AI Pipelines, A2A protocols
Models: Vertex AI, Model Garden, Gemini, Vertex AI Studio
Data: BigQuery, AlloyDB, Spanner, Dataform, Feature Store
Govern + observe: Vertex AI Model Registry, Gen AI Evaluation Service, Cloud Monitoring, Agentic SOC, BigQuery Data Lineage

Platform 4 · OpenAI on AWS

Cognition over governed plumbing.

A hybrid pattern: OpenAI provides the cognition (intent + reasoning + planning); AWS provides the governed Digital Brain (memory, knowledge, tools, observability). Best illustrated by the agentic-commerce customer-journey blueprint in the deck — a 9-step flow from "My internet keeps dropping since I moved" to a credit + a shipped Wi-Fi extender, with the agent never seeing raw payment data.

Cognition: OpenAI models for intent + context + reasoning
Brain Core (AWS): Graph DB + OpenSearch for journey context retrieval
Action: Agentic Commerce Protocol (ACP) for policy-gated, idempotent execution
Memory: Resolution outcomes link back to the journey graph for similarity matching next time

Platform 5 · Databricks

Lakehouse-native, Unity Catalog-governed.

Mosaic AI is the agentic surface; Unity Catalog runs governance end-to-end across data, models, agents, and tools.

Orchestration: Mosaic AI, Workflows, AI Gateway, Serving Endpoints (LangGraph optional)
Models: Mosaic AI, MLflow, Workspace, Vector Search
Data: Delta Lake, Unity Catalog, Databricks SQL, Databricks Share, Lake Base
Govern + observe: Unity Catalog (uniform), MLflow tracking, Databricks Secrets, DAB

Platform 6 · Snowflake

Cortex everywhere, governed by Horizon.

Cortex is the agentic surface; Horizon and Trust Center carry governance; Native Apps + Marketplace distribute agents the same way data is shared.

Orchestration: Cortex Agent Builder, Cortex Agents, Snowflake Tasks/DAGs, Tool Use / Function Calling, A2A via Snowpark REST
Models: Snowflake Arctic, Cortex LLM Inference, Cortex Fine-Tuning, Cortex Search, Vector Store, Document AI, Snowpark ML
Data: Dynamic Tables, Snowflake Semantic Model, Iceberg knowledge graphs, Data Marketplace, Clean Rooms, Snowpipe Streaming, Snowpark DataFrames
Govern + observe: Horizon, Trust Center, Cortex GUARD, Object Tagging, RBAC/Access Policies, Masking Policies, Access History, Query History/Profiling, Snowflake Observability

[Image Suggestion: Six small thumbnail-style "L2 architecture" cards arranged in a 3×2 grid, each one labeled with a platform name and showing a simplified 5-layer brain stack with native service tags. All six cards share the same shape and stack — only the labels differ — making the "same brain, different services" claim instantly visual.]

04

Where every platform is enough.

It is genuinely true that the major platforms cover the brain's plumbing well. If your engineering team is ready to wire it up, you can build the following layers entirely native — on any of the six. This is the consensus zone.

The "yes" column — fully native, all six platforms

Capability	Why it works natively
Brain Infrastructure	Compute, networking, security, identity, multi-tenancy, resilience — the platforms have spent a decade on this. Generally sufficient. Usually no third-party needed.
Data Accessibility	Secure access to enterprise data sources is solved. Lake Formation, Azure Data Lake, BigQuery IAM, Unity Catalog, and Snowflake RBAC are all enterprise-credible.
Model Recipe (fine-tuning)	Domain adaptation works on Bedrock + SageMaker, Azure ML + Foundry, Vertex AI fine-tuning, Mosaic AI, and Cortex Fine-Tuning. Hugging Face / vLLM only enters the picture for hybrid or non-native model mixing.
AI Lifecycle Automation (CI/CD)	CI/CD promotion gates exist: CodePipeline + CodeBuild, Azure DevOps, Cloud Build + Vertex Pipelines, GitHub + Workflows, Native App Releases. The pipelines themselves are fine; the eval-metric gates are where third-party adds value.
Infrastructure Observability	CloudWatch + X-Ray, Azure Monitor + Application Insights, Cloud Monitoring, Mosaic AI monitoring, Snowflake Observability — runtime/infra signals are well-covered.
Baseline Safety + Guardrails	Bedrock Guardrails, Azure Content Safety, Vertex AI safety filters, Cortex GUARD — refusal + profanity + PII safety is table-stakes everywhere now.

05

Where every platform leaves the same gaps.

This is the part of the deck that took the longest to build, and it's the part that pays back fastest. After mapping all six platforms layer-by-layer, the same nine capabilities fall short on every platform — sometimes by design, sometimes because the category is genuinely young, sometimes because the platforms are racing toward it but not there yet.

The nine universal gaps — and what fills them

Capability	Why natives fall short	What fills the gap
Industry Pattern Libraries	Platforms ship general templates. None ship deep vertical "industry cognition" or reusable domain-agent IP.	Accenture Industry Agent Libraries + SAP / ServiceNow / Salesforce ecosystem packs · LangGraph templates
Industry Agents	Vertical-specialized agents (banking KYC, fraud, claims, marketing ops) are not provided out of the box anywhere.	Accenture Industry Agents + Salesforce Agentforce · ServiceNow agents · SAP Joule extensions · Microsoft Foundry partner packs
Domain Ontology Engineering	No major platform provides ontology authoring + lifecycle tooling. The graph storage is there; the engineering is not.	TopBraid · PoolParty · Protégé (OSS)
Knowledge Representation (advanced reasoning)	Neptune / Cosmos DB Gremlin / BigQuery graphs / Iceberg via Snowflake all store graphs — but ontology-driven reasoning patterns and rules engines need more.	Stardog · Neo4j · TerminusDB (OSS)
Semantic Layer (enterprise governance)	Data semantics are covered by Glue / Synapse / BigQuery / Unity Catalog / Snowflake Semantic Model — but enterprise stewardship workflows + semantic contracts at scale are not.	Collibra · Alation · Atlan · OpenMetadata (OSS)
Agent Decision Lineage	Model registries exist (SageMaker, Azure AI, Vertex, MLflow, Cortex). The "why" trace across multi-agent decisions — evidence packs across chained reasoning — does not.	MLflow + OpenLineage + Collibra/Alation · Arize/Fiddler for QA gates
Agent Quality Observability	Hallucination detection · semantic correctness · tool misuse · agent behavior drift — all newer than the infra-observability tooling, and inconsistent across platforms.	Arize Phoenix (OSS) · WhyLabs · LangSmith · OpenTelemetry agent spans
Multi-Agent Explainability	Baseline explainability exists; chain-of-reasoning explainability across multiple agents working together does not.	Fiddler · TruEra · Arize · Evidently (OSS)
Agent Certification & Readiness	CI/CD + custom evals get you partway. No platform ships a productized "is this agent ready for production" certification framework.	Arize · WhyLabs · W&B + Great Expectations + custom certification scorecards

[Image Suggestion: A "platform coverage heatmap" — six columns (one per platform) and 23 rows (one per L2 capability). Cells are color-coded green (fully native), amber (partial), or grey (gap). The nine universal-gap rows show consistent grey/amber bands across all six columns — visualizing the thesis instantly.]

None of the gaps are fatal. All of the gaps are predictable. A team that walks in already knowing the nine has a 6-month head start on a team that learns them by hitting them.

06

What a vertical brain looks like.

The reference becomes concrete the moment you industry-fy it. The deck includes a worked example: The Banking Digital Brain on AWS — A Runtime Architecture Flow Blueprint. Same seven-step loop, banking-specific organs.

Banking Experience Layer

Where bankers actually work.

Banker Copilots
Investigator Workbench
Contact Center Assist
Digital Channels
Back-office Automation

Banking Data Foundation

Customer 360 + Risk 360, governed.

Redshift · Kinesis · S3 · Glue · Lake Formation
Customer 360 + Risk 360 as the headline data products
Banking source systems: core banking, CRM, KYC, fraud, contact center, market data, document repos

Banking Agent Ensemble

Five named agents.

Fraud Analyst · Loan Officer · Service Bot · Customer Analyst · Loan Analyst
Plus an Industry Agent harness for partner-supplied vertical agents
Cycle: Sense → Interpret → Evaluate → Learn → Govern → Reflect → Deploy

07

So how do you actually pick?

Cost spreads at enterprise scale are narrower than the headlines suggest (see 3.a). Capability gaps are uniform across platforms (see Section 05). So what does drive the choice?

1

Existing data gravity

If your data already lives somewhere, the brain probably should too. A Redshift + S3 estate wants AWS. A Synapse + Fabric estate wants Azure. A Lakehouse-of-record on Databricks or a warehouse-of-record on Snowflake are equally compelling reasons to stay put.
2

Operating model fit

If your team already lives in Vertex / ADK or in Foundry / Semantic Kernel, you'll ship faster on the platform whose mental model you've already internalized. The "best" platform is the one your engineers already trust.
3

Cognition strategy

If the cognition you need is OpenAI-shaped, the OpenAI-on-AWS pattern (Section 03 · Platform 4) is a real architecture, not a fallback. Hybrid is a first-class choice.
4

Plan for the gaps anyway

Whichever platform wins, the same nine gaps are coming. Budget for them — ontology engineering, agent decision lineage, agent certification, vertical agent IP — and build the partner stack into the architecture diagram from day one.

The three flavors of the ecosystem, at a glance

Hyperscalers

AWS · Azure · GCP

Deepest service breadth; full vertical stack from compute to cognition
OpenAI-on-AWS belongs here as a hybrid pattern
Best fit when the brain spans many capabilities and your data already lives there

Lakehouse-First

Databricks

Mosaic AI as the agentic surface · Unity Catalog governance end-to-end
Strong fit for ML-heavy, streaming-heavy, lakehouse-of-record estates
Multi-cloud portable

Warehouse-First

Snowflake

Cortex agents · Horizon governance · Marketplace + Native Apps for distribution
Strong fit for governed-BI consumption, data sharing, and clean-room patterns
Industry agents arrive via Marketplace the same way data does

08

The bottom line.

The platform decision and the architecture decision are not the same decision. The architecture is constant. The platform is a translation of that constant into a specific set of services and a specific set of gaps.

Bring the blueprint. Map it onto your platform. Plan for the same nine gaps that every platform has. Then you can have the cost conversation — because you'll know what you're actually pricing.

Ready to map the brain to your platform?

The full executive deck — every L2 architecture diagram, the per-platform native services, the gap tables, the banking and agentic-commerce examples — is the source of record. Open it for the diagrams, then talk to Atish for the engagement view.

Download the source deck ↓ Talk to Atish →

#01 · 1.d · AI Security Architecture · v1 · May 2026

Security isn't a layer.
It's a zone.
Architect around it.

Most enterprise security thinking still applies to AI — identity, encryption, network segmentation, audit, key management. What changes is the threat surface. Models are non-deterministic. Prompts are executable. Tools have side effects. The data that trains the system can be the attack vector against it. This is the architectural answer: four zones, twelve layers, thirty-nine capabilities, with the Agentic DMZ as a load-bearing security boundary every model interaction must cross — by design, not by exception.

4 zones 12 layers 39 capabilities OWASP · NIST · ATLAS aligned

Curator and Author Matt Lancaster · Reinvention Partner — Digital Core, AI & Data Lead · Source: Agentic Stack — Capabilities & Descriptions, extracted 2026-04-15

01

The story begins with a category error.

Walk into an enterprise AI program and someone will ask where the security layer goes. A box marked Guardrails. A box marked Content Filter. A box marked PII Redaction. Arrows. Everyone nods.

Then the system fails in production. Why? Because the boxes hid everything that mattered.

Web applications taught us that security is a cross-cutting concern — auth in front, encryption in transit, RBAC at the data tier. AI inherits all of that. And then breaks the model. A model is not a database. A prompt is not a query. A tool call is not a stored procedure. The attack surface isn't a port to close — it's a behavior to constrain.

AI security isn't a layer. It's a zone. When a CISO asks "where's the AI security layer?", the right answer is: "There isn't one. There's a controlled boundary — the Agentic DMZ — that every model interaction crosses. And there are security capabilities in every other zone that make the boundary mean something."

That's not hand-waving. That's the pattern. Four zones. One boundary. Twelve layers of control.

02

The four zones, named.

Every enterprise-grade agentic system decomposes into four zones with distinct security, governance, and execution characteristics. Skip one and you've shipped a demo. Cover all four and you've shipped a system. Each zone is a trust boundary — meaning every transition between them is a place where security controls earn their keep.

Fig 1. The four-zone agentic stack. Zone 2 is the load-bearing security boundary — every external interaction crosses it before reaching agent execution; every model invocation crosses back through it before reaching the user. The other three zones contribute security capabilities that make the boundary enforceable.

Each zone has a distinct security mandate. None of them works without the others.

Zone 1 · Channels — Authenticate the actor. Authorize the action. Capture consent. If you cannot identify who is on the other end of the wire, no downstream control matters.
Zone 2 · Agentic DMZ — Normalize the input. Filter sensitive data. Enforce content policy. Defend against prompt injection. This is where the "AI" part of AI security earns its name.
Zone 3 · Agentic Apps — Isolate execution. Mediate tool access. Bound agent autonomy. The model can suggest anything; the runtime decides what actually executes.
Zone 4 · Agentic Foundation — Encrypt at rest. Govern the model registry. Monitor drift. Audit every token. The platform-level controls that make incident response possible.

03

Zone 2 is the idea everything else rests on.

A DMZ — demilitarized zone — is a forty-year-old network pattern: a controlled space between a trusted interior and an untrusted exterior, where every transition is mediated by explicit security controls. The Agentic DMZ applies the same pattern to AI — a controlled boundary between users and agent execution, where every prompt is normalized, every input is filtered, and every model boundary is enforced before reasoning begins.

Fig 2. A DMZ is a forty-year-old pattern. The Agentic DMZ is the same pattern at a new substrate — controlled boundary, mediated transitions, explicit controls — with prompt injection, PII, and tool-access taking the place of port-level firewalls and IDS rules. Same shape. New attack surface.

Three layers do the work:

Signal Processing & Normalization — Speech-to-text with diarization and language detection. Text-to-speech with consistent voice identity. Multimodal normalization that strips raw input down to a structured, tagged format. The model never sees raw audio, raw HTML, or raw user upload. It sees a normalized representation the rest of the boundary controls can reason about.
Session & Flow Control — Turn management. Conversation state. Flow governance. Rate limiting. Loop prevention. This is the layer that catches the abuse pattern before the prompt-injection layer does. An agent that detects a barge-in storm or a backchannel flood doesn't need a content filter — it needs a circuit breaker.
Input & Prompt Security — PII detection, masking, and tokenization on the way in. Toxicity detection, domain restrictions, and compliance guardrails on inputs and outputs. Adversarial detection, tool-access controls, and model-boundary enforcement against prompt-injection attempts. This is the layer most people mean when they say "AI security." It is not the only one.

The Agentic DMZ is the answer to a single architectural question: where do the AI-specific controls live? Not scattered through every microservice. Not bolted onto the model wrapper. Not duplicated by every team that ships an agent. In one named zone, with one named owner, that every interaction must cross.

04

The threat surface, decomposed.

Three industry standards have converged on a shared map of the AI threat surface. None of them replaces the others. Together they tell you what to look for, where to look for it, and how to talk about it with people who don't build AI.

OWASP LLM Top 10 (2025) — Application-level risks. Prompt injection, sensitive information disclosure, supply-chain compromise, insecure output handling, excessive agency, training-data poisoning, model denial-of-service, insecure plugin design, overreliance, model theft. This is the developer's catalog. If you build an agent, you should be able to name all ten.
MITRE ATLAS — Adversarial Threat Landscape for Artificial-Intelligence Systems. The same idea as MITRE ATT&CK, applied to ML. This is the red team's catalog. Tactics and techniques an attacker uses against models in the wild — reconnaissance, initial access, ML model access, evasion, exfiltration, impact.
NIST AI Risk Management Framework — The governance frame. Map → Measure → Manage → Govern. This is the board's catalog. What an enterprise has to be able to say about its AI systems before regulators, auditors, or a customer's risk team will let them through procurement.

The architecture's job is not to repeat any of these. The architecture's job is to make sure every entry in every catalog has a place in the stack where the control belongs — and a person whose name is on enforcing it.

Door A's risk catalog already maps thirteen agent-specific risks across five categories — Confidentiality, Integrity, Availability, Harmfulness, Honesty — onto a five-stage control pipeline aligned to the OWASP LLM Top 10. This page does not reproduce that catalog. It places the catalog into the four-zone architecture so the controls have somewhere to live.

05

Three control disciplines. Every zone uses all three.

Inside every zone, security controls fall into one of three disciplines. Most teams ship the first one and forget the other two. That is the most common reason a working AI system becomes an unworkable AI security incident.

Prevent — Stop the bad outcome from happening. Authentication. Authorization. PII redaction. Prompt-injection defense. Tool-access policy. Container isolation. Network segmentation. Encryption. Most of the work, none of the visibility.
Detect — Notice when prevention fails. Anomaly detection on prompts. Drift monitoring on models. Distributed tracing on agent runs. Token analytics. Conversation replay. Audit logging. The instrumentation that turns "something feels off" into a ticket.
Respond — Contain the blast radius. Kill-switches at the model gateway. Rollback at the agent registry. Quarantine at the tool gateway. Incident response playbooks that name the on-call. Post-incident review that closes the gap that opened the door. The discipline that turns one bad day into a learning, not a press release.

Fig 3. The control matrix. Twelve cells, four zones, three disciplines. Zone 2's row carries the heaviest load — it is the AI-specific zone — but no row is allowed to be empty. A zone without all three disciplines is a zone with a hole in it.

A zone without all three disciplines is a zone with a hole in it. Prevent without Detect is a guess. Detect without Respond is a complaint. Respond without Prevent is theatre. The four-zone pattern works because every zone is built to do all three.

06

Security shows up in seven of the nine viewpoints.

Door A — The Blueprint — names nine architectural viewpoints for any intelligent agent system. AI security is not a tenth viewpoint. It is a property that shows up in seven of the original nine, and the architect's job is to know where.

Viewpoint (from 1.a)	Where security lives	Anchor zone
Data	Classification, lineage, retention, residency. Encryption at rest and in transit. Access policy on every data store the agent reads or writes.	Zone 4
Runtime	Container isolation. Sandboxing. Memory hygiene between sessions. Side-effect containment for tool calls.	Zone 3
Cognitive	Prompt-injection defense. Output validation. Adversarial-input detection. Boundary enforcement on what the model can be asked to do.	Zone 2
Security	The architect's stewardship of every other row. Threat model. Control catalog. Control owner. Audit cadence.	All zones
Integration	Tool-invocation gateway. Permission scope on each connector. Response validation. Per-call authorization, not session-level grants.	Zone 3
Infrastructure	Network segmentation. Identity infrastructure. Key management. Hardware-backed enclaves where the workload requires them.	Zone 4
Model	Model registry with provenance. Prompt versioning. Drift monitoring. Model-supply-chain controls — including what was used to train it and what was used to fine-tune it.	Zone 4
DevMLOps	Secure CI/CD for prompts and models. Pre-deployment evaluation gates. Environment promotion controls. Rollback paths.	Zone 4
Multi-agent	Agent-to-agent authentication. Delegation boundaries. Conflict resolution that does not silently expand authority.	Zone 3

Security shows up in every row. What changes is which zone holds the primary control and which discipline — Prevent, Detect, Respond — owns the response. The viewpoint says what to think about. The zone says where to put it. The discipline says how to enforce it.

07

Prompt injection, walked all the way through.

Pick one risk and trace it across all four zones. Prompt injection is the right one — it is the AI-specific risk most people have heard of, the one most often miscategorized as "just a content-filter problem," and the one whose mitigation pattern reveals every part of the architecture at once.

Zone 1 — Channels. Authenticate the user. Bind the session to a verified identity. If the request comes from an authenticated, authorized actor, you have a name attached to the bad input. If it doesn't, the rest of the controls have less to work with.
Zone 2 — Agentic DMZ. Normalize the input — strip exotic Unicode, decode embedded payloads, separate user content from system instructions. Detect adversarial patterns. Filter known injection signatures. Tag retrieved content (RAG context, tool output) as untrusted so the model treats it as data, not as instruction. This is the layer that catches most attempts.
Zone 3 — Agentic Apps. Enforce least privilege at the tool-invocation gateway. The model can request a high-impact action; the gateway decides whether the current session is authorized to perform it. Bound agent autonomy with policy: a model that wants to call a destructive API should never be the only voice in the decision.
Zone 4 — Agentic Foundation. Log the prompt, the retrieved context, the model output, and the tool call as one correlated trace. Monitor drift in detection efficacy over time — adversaries adapt. Replay conversations on demand. If detection failed in Zone 2 and authorization caught it in Zone 3, the audit trail in Zone 4 is what tells you why.

Fig 4. The same prompt injection traced across the stack. Zone 1 names the actor. Zone 2 normalizes and filters and catches most attempts. Zone 3's tool gateway denies the privileged action even if Zone 2 missed. Zone 4's correlated trace tells the post-incident review what to fix. No single zone defeats it. Four zones in sequence do.

No single zone defeats prompt injection. Four layers of partial defense, applied in sequence, do. A control that works ninety percent of the time, layered four times, gets you to four nines. That is the architectural insight. The rest is engineering discipline.

08

A footnote on Door B — because architecture is the control.

Door B — Costco Runs It — is built on a Nexus architecture: differentiated capabilities anchored in a sovereign core, commodity capabilities federated to satellites that ride close to the data they already serve. That pattern is not a security pattern. It happens to be a security pattern.

Look at what Nexus does, in security terms:

The core is a trust boundary. Knowledge layer, governance, model registry, and central control plane live in the core. Differentiated decisions cannot be made outside it. One control plane. One audit trail. One on-call.
The satellites are blast radius limits. Salesforce, SAP, ServiceNow run their commodity agents close to their own data, brokered through MCP. A compromise at a satellite cannot cascade into the core unless the core's policy layer permits it.
MCP is the controlled boundary. Every cross-zone call is mediated. Tool-access policy travels with the request. The protocol itself is the place security is enforced — not a separate "gateway tier" that has to remember to be there.

The four-zone pattern is what Costco is shipping. The Nexus core is Zones 3 and 4. The satellites are bounded extensions of Zone 3, mediated through Zone 2 boundary controls expressed as MCP policy. "Run it anywhere" and "secure it everywhere" are the same sentence.

09

The bottom line.

AI security is not a layer to add. It is a zone to architect around. The Agentic DMZ is the load-bearing concept; the four-zone stack is what makes it enforceable; the three control disciplines are how each zone stays honest; and the nine viewpoints from Door A are where the work actually gets done.

Three things to walk away with:

Name the boundary. If your team cannot point at the one zone every model interaction must cross, you do not have a boundary. You have hope. Hope is not a control.
Name the controls per zone. Identity in Zone 1. Prompt security in Zone 2. Tool-access mediation in Zone 3. Governance and audit in Zone 4. Every zone needs Prevent, Detect, and Respond. No zone gets a pass.
Name the standards behind it. OWASP LLM Top 10 for the developer's catalog. MITRE ATLAS for the red-team's catalog. NIST AI RMF for the board's catalog. One architecture, three audiences, the same pattern underneath.

This is the security pattern that runs through Doors A, B, and C. The framework explains the viewpoints; the spotlight shows the Nexus pattern; the ecosystem shows where each platform's gaps live. This page shows the boundary they all enforce.

Ready to secure your agent architecture?

This page is a v1 articulation of the AI security architecture pattern that threads through the v7 Intelligent Agent Reference Architecture, the Costco Nexus blueprint, and the six-platform Intelligent Digital Brain ecosystem map. The four-zone model and capability inventory are reproduced from the Agentic Stack — Capabilities & Descriptions source materials extracted on 2026-04-15. Content is under active development. Re-validate against the latest source release before scoping a new engagement.

Talk to Matt → Source deck · download ↓

#04 · Human in the Lead

Agentic AI works
when humans
stay in the lead.

Tools change every quarter. Foundations don't. Human in the Lead is where we keep the curriculum that turns engineers, analysts, and leaders into people who can actually command agentic AI — paired with the foundational concept primers that explain what's happening underneath, and the partner field reports that tell us what's actually shipping. Three ways to keep your team in the lead. Pick your door.

3 sub-chapters 1 Citizens · Human-in-the-Lead Training 1 foundational primer 1 partner field report

Pick your door

Training program · Foundational concept · Partner field report

01

Three ways to keep humans in the lead.

Some teams get there through a structured, multi-day bootcamp. Others get there through one perfect weekend with a primer that finally makes the math click. And some get there by reading the field report from someone who just spent the week in San Francisco at the partner's biggest event of the year. Human in the Lead holds all three.

Behind Door A — Citizens Spotlight — is Human-in-the-Lead Training, the multi-day agentic AI program we ran for Citizens. Four modules · 417 slides, all built on the premise that humans stay in command of the agents. All four are live now — Day 0 (the May 2025 foundations preview) plus the three live days of the September 2025 Citizens AI Academy Track C: Banking Reinvention, Tool Use & Reasoning, Memory & Planning.

Behind Door B — Words as Numbers — is something more foundational: the vector embeddings primer our Center for Advanced AI built to teach the building block underneath every modern generative AI system. 26 slides. Worked examples. The math, demystified. If you've ever sat in a room where someone said "just embed it" and you weren't sure what that meant — this is the door.

Behind Door C — Agentic Enterprise — is the Google Cloud Next '26 recap: every announcement that matters from Google's biggest event of the year, organized by the six-layer stack Google itself laid out — Agentic Taskforce, Agent Platform, Agentic Defense, Agentic Data Cloud, Research & Frontier Models, AI Hypercomputer. The deck Google's own alliance team handed us. The thesis, the receipts, and the customer stories — translated into something you can actually use on a Monday.

Read in any order. The primer explains the foundation; the bootcamp shows how to build on it; the field report tells you what one of the world's three biggest AI partners is actually shipping. Together they cover the full distance from "what's a vector?" to "here's what Google announced last week, and why it matters for your roadmap."

#04 · 4.b · Vector Embeddings · v5

How machines
turn words
into numbers.

Every modern AI system — from search to chatbots to recommenders — runs on the same foundational trick. Take a word, an image, a sound clip, a heartbeat, anything that isn't a number. Turn it into a list of numbers. Then let the math find what's similar, what's different, and what belongs together. This is that trick, demystified.

26 slides · the foundational primer 2 authors · CAAI ∞ dimensions · in theory

Authors Lan Guan · Mo Nomeli · Center for Advanced AI · Lan Guan (Chief AI & Data Officer) · Mo Nomeli (CAAI Global Lead AI Learning & Emerging Tech)

01

Computers think in numbers. Humans don't.

Imagine a database of 50,000 companies. Tabular data is easy. Names, CEOs, headquarters, employee counts, industries. Find all companies with more than 1,000 employees. Sort CEOs alphabetically. Calculate the average company size. One SQL query. Done.

Now imagine a press release attached to one of those rows: "Acme Inc. revealed a significant strategic shift under its newly appointed CEO, Jane Smith. Smith outlined a comprehensive plan focusing on sustainable growth initiatives..."

Now ask: Which other CEOs are pursuing sustainability? Is this strategy shift common in the industry? How might this affect Acme Inc.'s stock price?

Fig 1. Structured tabular data is rich with operations — filter, sort, calculate. Unstructured text is rich with insights — but those insights are locked behind a wall of language nuance, context, and meaning. Embeddings break that wall down.

Unstructured data is where the real signal lives. The challenge is that traditional tools can't process it — they need additional steps to unlock the value. Vector embeddings are those additional steps.

02

Measure. Compare. Discover.

Once data is in vector form, the math takes over — and it's a particular kind of math. Three operations matter.

Measure the distance between individual data points
Determine the similarity between different data points
Transform data in ways that are useful for analysis

The way "similarity" gets measured is the part most people skip. The standard answer is cosine similarity — the angle between two vectors. Three angles tell the whole story.

Fig 2. The three states of cosine similarity. Near 0° means the vectors point in the same direction — they're similar. Near 90° means they're perpendicular — unrelated. Near 180° means they point opposite ways — they oppose each other.

03

What is a vector embedding, exactly?

Strip away the jargon and the answer is genuinely simple. A vector is a fixed-length array of numbers that represents a point in a mathematical space.

Each number in the array corresponds to a direction (or dimension) within that space, and its value determines the vector's magnitude in that direction. Vectors in machine learning can have thousands of dimensions — those are difficult to visualize. But simpler vectors with two or three dimensions can be easily graphed and understood.

A vector embedding — or simply, an "embedding" — is a way to turn things that aren't numbers (like words or pictures) into a list of numbers. This list captures the important qualities and relationships within the original data. Embeddings capture semantic similarity, tone, and hierarchical relationships: "MIT" will be close to "University." "Happy" will be farther than "Sad." "Car" will be close to "Vehicle."

Here's the worked example from the deck. Three West Coast cities, each described by three numbers — longitude, latitude, and population. That's a 3-dimensional embedding. The cities exist as points in a 3D space.

City	Longitude	Latitude	Population (Millions)
Los Angeles	-122.4	37.8	4.2
Seattle	-122.3	47.6	3.9
Vancouver	-123.1	49.3	2.4

Fig 3. Three cities in three dimensions. Move along the longitude axis, then up the latitude axis, then up the population axis — and you've placed each city in its own spot in space. Now imagine doing this with 1,536 dimensions instead of 3. That's what a real text embedding looks like.

Key takeaway: embeddings work the same way — just with more dimensions. More dimensions mean capturing more complex nuances and revealing hidden patterns that would be invisible in 2 or 3 dimensions.

04

Translating data for computers.

Computers struggle to directly understand the way humans communicate — text, pictures, sounds. To help, we turn these formats into numerical representations called "vectors" that computers can process more easily. Same trick. Three different modalities.

Fig 4. Three modalities. Three models. One unified output format. Once everything is a vector, the same math works on all of it — which is why a single AI system can search across audio, text, and video at once.

05

Data has a secret code.

Each modality encodes a different kind of "meaning." Same idea, four different signatures.

Modality 1 · Text

Text embeddings understand how words are related.

Words like "king" and "queen" sit close together
"King" and "car" sit far apart
The geometry IS the meaning

Modality 2 · Image

Image embeddings turn pictures into a special code.

The code remembers what the picture looks like — colors, shapes, smoothness
An orange sits closer to a yellow object than a black one
Visual similarity becomes spatial proximity

Modality 3 · Audio

Audio embeddings turn sounds into a code.

The code remembers pitch, instrument, character of the sound
A piano and a guitar have different codes — even playing the same note
Acoustic identity becomes a vector

Modality 4 · Temporal

Temporal embeddings track changes over time.

Records how heart rate moves during rest, sleep, running
Compare heart rates across activities — spot unusual patterns
Time-series shape becomes a fingerprint

06

Why old NLP failed at meaning.

Before embeddings, computers tried to handle language with two main techniques: n-grams (contiguous sequences of n words — unigrams, bigrams, trigrams) and bag-of-words representations. They worked, mostly. Until they didn't.

The problem: those approaches were context-agnostic. They counted word frequencies. They ignored what words actually meant in context. A vector embedding fixes that.

Aspect	N-grams	Vector Embeddings
Definition	Contiguous sequences of n words (unigrams, bigrams, trigrams)	Dense, continuous vector representations for words or sentences
Representation	Based on word frequencies within n-grams	Captures meaning and context
Limitations	Sparse (high-dimensional vectors with many zeros) · Context-agnostic · Ignores word order	Context-aware · Encodes meaning and context
Recent advances	N-grams with feature engineering	Contextualized embeddings · Transformer-based architectures · Knowledge graph integration · Multilingual & cross-lingual embeddings · Bias mitigation · Embeddings for specialized domains

A worked example — the two faces of "Apple"

Sentence A: "I love my new Apple laptop. It's fast and the design is sleek."

N-grams approach: unigrams only consider individual occurrences like "apple," "laptop," "fast" — missing the contextual nuances of what "apple" refers to.

Embeddings approach: a vector for "Apple" here lands close to "computer," "innovation," "design."

Sentence B: "This apple was mushy and disappointing."

N-grams approach: bigrams capture some context — "apple laptop," "new apple" versus "mushy apple" — but may still struggle if other fruit-related words are common in reviews.

Embeddings approach: a vector for "apple" here lands close to "sweet," "ripe," "tree."

Same word. Same letters. Two completely different points in meaning space.

Fig 5. Same string of letters, two different points in space. The whole reason embeddings unlocked modern NLP is that they finally taught machines what every human already knew — context changes meaning.

07

What you can actually do with embeddings.

Once your data is in vector form, a whole catalog of capabilities unlocks. Six come up most often.

Use case 1

Finding similar things — semantic search.

Embeddings help find similar words, documents, or even products. The classic example: news articles about the same topic. Or — "healthy breakfast options" retrieves content like "nutritious meals." Even though the words are different, the meaning is close.

Use case 2

Organizing data — automatic categorization.

Embeddings group similar things together and help label them — teaching computers how to sort items automatically. In a customer service use case, embeddings can categorize and retrieve similar inquiries and pain points, leading to faster resolution.

Use case 3

Better search engines.

Embeddings make search engines smarter. They can find what you're looking for even if you don't use the exact same words as the underlying content.

Use case 4

Smart recommendations.

Websites use embeddings to suggest things you might like. Watch a certain kind of movie and they'll suggest similar ones — because the movies are nearby in vector space.

Use case 5

Seeing the big picture.

Embeddings can be turned into pictures — visualizations — to see how different pieces of data relate to one another at a glance. That's how you find the unexpected clusters.

Use case 6

Faster learning.

Embeddings let computers use what they've already learned for new tasks. The model trained for one job can be repurposed for the next — so it learns faster.

08

Where do you put a billion vectors?

Once you've embedded everything, you need somewhere to store, index, and search across massive datasets of unstructured data. That's a vector database — purpose-built for this exact job.

Fig 6. Two databases. Two different jobs. The traditional one finds exact matches. The vector one finds meaningful neighbors. Both are useful — for different things.

Where vector databases shine — five popular use cases.

LLM Retrieval Augmented Generation (RAG): powering advanced chatbots and generative AI systems that need to access and process vast amounts of information. Embeddings help retrieve the most relevant vectors (top K) to ground LLM responses in accurate, contextually rich data.
Question and answer systems: enabling accurate and relevant responses to user questions.
Recommender systems: tailoring suggestions (products, content, etc.) based on user preferences and similarity analysis.
Semantic search: providing search results based on the meaning and context of the query, not just keywords.
Image, video, and audio search: finding similar media based on visual or audio characteristics.

09

The architecture that put embeddings on every roadmap.

If you've heard of RAG — Retrieval-Augmented Generation — you've heard of the architecture that made vector embeddings business-critical. Here's how it actually works.

Fig 7. The five-step RAG flow. The LLM doesn't know your private data — but it doesn't have to. The vector DB retrieves the relevant docs, the Q/A system stitches them into the prompt, and the LLM reasons over both together. Embeddings are the bridge.

10

A worked exercise — see it for yourself.

The deck closes with an exercise. Take seven words. Cluster them.

The list: [sciences, weather, institute, college, school, university, climate]

The challenge: arrange them into two clusters — one for education, one for weather. You can probably do this in your head. The question is whether the math agrees.

Here are the actual 3-dimensional embeddings from the deck:

Word	Embedding (3-dim)
sciences	`[0.7, 0.5, 0.3]`
weather	`[0.2, 0.7, 0.5]`
institute	`[0.75, 0.4, 0.25]`
college	`[0.65, 0.35, 0.4]`
school	`[0.6, 0.3, 0.45]`
university	`[0.7, 0.45, 0.35]`
climate	`[0.15, 0.65, 0.55]`

Fig 8. The math agrees. sciences, institute, college, school, university sit in one neighborhood; weather, climate sit in another. The embeddings encode meaning even at just 3 dimensions — and the distance between clusters is itself a measurement of the semantic gap.

Key takeaway: the embedding analysis reveals that words related to education share similar numerical representations, forming a distinct cluster — and the same applies to weather-related terms. Embeddings capture these nuances of meaning, which can be far more powerful than simple keyword analysis.

11

Eight key future trends.

Where embeddings go next, in the deck's words.

Cross-modal embeddings to handle text, image, audio together
Integration with quantum computing to accelerate similarity search
Ethical AI to reduce bias
Continuous learning to adapt to new data dynamically
Explainable embeddings to understand relationships
Integrating with AI agents
Unsupervised learning enhancements using embeddings
Ensemble RAG

12

Five takeaways.

Vector embeddings are the foundational trick. Bridging the gap — translating various types of data (words, images, etc.) into a format that computers can easily work with. Understanding relationships — embeddings aren't just about the data itself; they capture how different pieces of data relate to one another. Unlocking generative AI — embeddings empower many types of generative AI, where the goal is to create new things (text, images, code, etc.). Condensing information — instead of dealing with complex raw data, embeddings provide a compact, meaningful representation. Powering data-driven decisions — by understanding data through embeddings, we can make informed decisions and create innovative solutions.

And on the business side: smarter search, deeper insights — find documents, products, or information based on true meaning, not just keyword matches. Enhanced customer understanding — analyze feedback, reviews, and social media sentiment with nuance for actionable insights. Streamlined processes — automate tasks that rely on understanding language, from support ticket routing to content summarization. Competitive edge — extract valuable information and patterns from text data that traditional methods miss.

The next time someone says "just embed it," you'll know exactly what they mean — and exactly what makes it work.

Three questions to ask your team next.

The deck closes with three questions to spark the right conversations. Use them. They surface where embeddings can deliver the most value in your organization.

Source deck · download ↓

Discussion Prompts

Challenges: "What are some current tasks where our ability to understand language is a bottleneck?" This surfaces pain points embeddings might address.

Data: "What kinds of text data do we have that might be underutilized — customer support, market search, compliance, etc.?"

Feasibility: "Are there areas where a small-scale embedding project could be a good proof-of-concept?" This promotes actionable next steps.

#04 · 4.c · Google Cloud Next '26 · Recap

Everything Google
just announced.
Translated.

Once a year, Google Cloud puts every product team on a stage in San Francisco and says "this is what we believe the next twelve months of enterprise AI looks like." Next '26 was that stage. Six layers. Hundreds of announcements. One thesis: the Agentic Enterprise — where intelligence meets action. This is that 71-slide field report, organized by Google's own stack and translated into something a delivery lead can actually use on Monday.

71 slides · partner field report 6 layers · the Google AI stack 5 Google alliance authors

Source Google Cloud · Next '26 official recap · Delivered by the Google alliance team — Anil Mehta, Blaise Abderholden, Chase Crowson, Nishant Kulkarni, Anjana Nandi. Proprietary to Google Cloud; internal Accenture distribution only. All product names, customer stories, statistics, and launch-stage indicators (GA / Preview / Pre-announcement) reproduced from the source deck.

Watch first — 8 minutes · narrated walk-through of Google's six-layer agentic stack

01

The thesis: where intelligence meets action.

Last year, the keynote story was models. This year, the keynote story is agents — and Google's framing for it is sharper than most. "The Agentic Enterprise at scale." Context for every action. Agents for every process. Intelligence for every person. Success for every industry.

Strip the marketing varnish and the underlying claim is concrete: agents only matter if they can act — read your data, hold context across tools, follow policy, and finish work without supervision. That's the sentence the entire deck is engineered to defend, layer by layer.

Google's structural argument for why they are the partner to build this on rests on three pillars they repeated all week: full-stack co-design (every layer optimized for AI together), multicloud-by-default (their tools work where your data already lives), and enterprise-ready hyperscaler (resilience, scalability, security, sovereignty). The line they kept hammering: "Google Cloud is the only provider to offer first-party solutions across the entire AI stack."

Fig 1. Google's stack, in their own words. Read it top-down (where work happens) or bottom-up (what makes it possible). Every layer below has its own product slate — and every layer's headline announcement at Next '26 is built to make the layer above it more capable.

02

The big rebrand: Vertex AI is now Gemini Enterprise.

If you take only one thing from Next '26, take this: Google has unified its AI portfolio under a single Gemini Enterprise umbrella. The business-user app, the developer platform formerly known as Vertex AI, and the customer-experience suite are now one named system: Gemini Enterprise, Gemini Enterprise Agent Platform, and Gemini Enterprise for Customer Experience.

The plain-English version: "Vertex AI" is now "Gemini Enterprise Agent Platform" — and it's no longer pitched as a model-serving platform with some agent features tacked on. It's pitched as the place you build, scale, govern, and optimize agents, with the old Vertex capabilities (Model Garden, Model Builder, Agent Builder) folded inside.

Google's framing for the platform is a four-word sentence — and the architecture makes good on each verb:

Fig 2. The four-pillar story Google told all week. Build covers ADK, Agent Studio, and the Agent Garden of pre-built agents. Scale is Agent Runtime — sub-second cold starts and Memory Bank for long-term context. Govern is the new identity, registry, and gateway primitives that make zero-trust enforceable per agent. Optimize is simulation, evaluation, and observability — the operations layer most agent platforms still skip.

The customer logo wall on this slide reads like an enterprise-AI honor roll: L'Oréal, Citi, Color Health, Bloomberg, Deutsche Bank, Goldman Sachs, Mercedes-Benz, PayPal, Reddit, ServiceNow, Snyk, Toyota, Unilever, Wayfair, WPP, Yahoo. The two stories Google chose to lead with: L'Oréal built a proprietary "Beauty Tech Agentic Platform" on the Agent Platform with ADK; Citi launched Citi Sky, an AI wealth platform that is now proactively handling 90% of rollovers via the AI assistant. That second number is the kind of receipt a CFO can act on.

03

Agentic Taskforce: the front door for everyone else.

If Agent Platform is for developers, Agentic Taskforce is for everyone else — and Google split it into two distinct products: the Gemini Enterprise app (where employees create and orchestrate agents) and Gemini Enterprise for CX (where the same agents serve customers). The two share the same Agent Platform plumbing underneath. That symmetry is the whole point.

The headline features inside the Gemini Enterprise app are a tour of every agent UX pattern of the past year, packaged together:

Agent Designer (private preview) — anyone can build complex multi-system workflows in natural language. The pitch is "low-code agent creation without the bottleneck of asking IT."
Canvas Mode (private preview) — an interactive co-creation editor for Docs and Slides that pulls in your work and personal context. M365 interoperability means you can export to Microsoft Office formats — a clear shot at Copilot.
Projects in Gemini Enterprise (experimental) — a "shared brain" for teams that strictly grounds the AI in explicitly added files, preventing context loss and irrelevant hallucinations.
Inbox in Gemini Enterprise (experimental) — a unified hub for managing long-running agents at scale, with status alerts via email and chat.
Skills (experimental) — codify your unique expertise into reusable Skills, invokable anywhere you use Gemini.
Long-running Agents (experimental) — multi-step workflows like end-to-end financial reconciliation or sales-prospect sequencing without constant human supervision.

The CX side is where Google is making its sharpest competitive claim: "the only platform that seamlessly unifies shopping and service." The product suite is Omnichannel Gateway → CX Agent Studio → AI Commerce Search → Agent Assist → Conversational Insights — covering the full arc from intent-aware search to live agent coaching. The receipts on this slide are the quietly impressive part:

Fig 3. The CX receipts. Humana's 80 million calls per year is the kind of scale that makes the Agent Assist story credible — that's not a pilot, that's production.

04

Workspace: the agentic operating system for work.

The Workspace announcements are where the deck stops being abstract. Workspace Intelligence is the central claim: a secure system that "inherently understands complex semantic relationships within your specific work ecosystem" — apps, collaborators, domain knowledge — so you don't have to repeat context in tasks. In English: your agents already know who your team is and what you're working on.

The interface that exposes this to users is Ask Gemini in Google Chat (preview) — pitched as "a unified command line for all of your work." Three things make it land:

A daily briefing that surfaces important tasks, unread threads, and urgent action items.
Skills in Workspace — completing complex tasks like generating documents and slides directly from chat.
Expanded third-party connectors — Gemini now bridges Workspace content with external tools like Asana, Jira, and Salesforce. This is the connector breadth that's been the missing piece versus Microsoft 365 Copilot.

The new in-product AI features cover the full Workspace surface: Docs Enhancements generates infographics and triages documents from comments. Slides Generation produces full editable decks in one shot using shared context. Interactive Canvas in Sheets builds spreadsheets via natural language and creates interactive mini-apps (dashboards, kanban boards) on top of live data. Drive Insights & Projects centralizes file context for Gemini. Avatars in Vids (GA) converts presentations into videos with branded avatars including company logos and backdrops.

Two more bets worth flagging:

Workspace MCP Server (public preview) — lets developers bring advanced Workspace capabilities (synthesizing Drive documents, drafting Gmail responses, managing Calendar and Chat logic) directly into their AI applications and agents within a secure, open framework. This is a meaningful bet on MCP as the agent-tool standard.
Rapid Enterprise Migration with Workspace (preview) — Google's claim is that migrating from Microsoft 365 to Workspace is now up to 5× faster with a new cloud-based data import service plus AI-powered Office macro converter, Office file editing in Gmail, and redlining in Docs. Read this as the M365-displacement play getting sharper teeth.

And the security/governance posture caught up to the agent story: AI control center, regional data locking (US and EU now, Germany and India coming), and client-side encryption that lets you "authoritatively deny access to any agent and any entity, including Google itself." That last clause is unusually direct phrasing for a hyperscaler.

05

Agentic Defense: the SOC gets a fleet.

The security layer is where Google's Wiz acquisition starts paying off in the keynote. The Wiz AI-Application Protection Platform (AI-APP) went GA — agentless visibility into AI applications across any CSP, hosted, custom code, cloud and PaaS. And Wiz introduced a color-coded fleet of AI agents that maps neatly to a real SOC's day:

Fig 4. Wiz's color-coded agent fleet. The pattern is the same one Google used elsewhere all week — specialize agents by job, then put a workflow agent on top of them. The Triage and Investigation Agent in Google Security Operations did the same thing on the broader SecOps platform — Google says it has triaged 5+ million alerts, turning a 30-minute analyst job into roughly one minute.

Two other security stories worth reading carefully:

Google Cloud Fraud Defense (pre-announcement) — explicitly framed as "the evolution of reCAPTCHA", repositioned as a unified trust platform for the agentic web. The single layer verifies humans, bots, and autonomous AI agents across the entire digital commerce journey from registration to payment. Read between the lines: as agents start buying things on behalf of humans, "is this traffic legit?" becomes a much harder question — and Google wants to be the one answering it.
Dark Web Intelligence (preview) in Google Threat Intelligence — Gemini-powered processing of 10 million dark web events daily at 98% accuracy, dynamically profiling each customer's brand and assets to surface relevant data leaks and insider threats. Stops attacks before the first match is struck, in their phrasing.

06

Agentic Data Cloud: the context engine under everything.

Every agent claim above only works if the underlying data layer can keep up. The Agentic Data Cloud announcements are dense — six product families with a slate of features each — but the through-line is consistent: turn the data platform into something agents can use directly, without a human-built pipeline in between.

BigQuery got the headline numbers. Fluid Scaling with true per-second billing claims up to 34% cost savings on dynamic workloads. Advanced Runtime Optimizations claim up to 200× faster queries with no schema or code changes — and a 35% YoY improvement in query speed and 40% YoY reduction in query processing costs. Native multimodal processing via ObjectRef and ai.parse_document lets developers parse and analyze documents alongside structured data inside the Knowledge Catalog. TimesFM and Tabular FM bring zero-shot forecasting and tabular classification directly into BigQuery — no model training required.

The single most important new product on this layer is the Knowledge Catalog (GA), framed as "always-on enterprise semantics" — a dynamic context engine that replaces static data dictionaries, extracts entities, resolves conflicting definitions, and maps complex business relationships. The Deep Research Agent in Gemini Enterprise natively leverages it. Bloomberg Media's CTO is quoted as the proof point — they unified enterprise metadata and business context through Knowledge Catalog to launch their Data Access AI Agent. Spotify's CTO appears two slides later citing Apache Iceberg interoperability.

Other announcements worth tracking by name:

Lightning Engine for Spark (GA) — vectorized execution engine claiming 4.9× faster query completion than open-source Spark. Unifying lakehouse architecture is pitched at 117% ROI with payback under six months.
Iceberg REST Catalog (preview) — full read/write interoperability between BigQuery, Spark, and third-party OSS engines.
SAP BDC for BigQuery (preview) — bidirectional, zero-copy data sharing between SAP Business Data Cloud and Google's Agentic Data Cloud. Read this as: SAP gravity, no copying required.
Dashboard Agents in Looker (pre-announcement) — natural language questions inside dashboards for context-aware answers. Looker Hosted MCP Server (pre-announcement) exposes Looker's governed semantic layer to MCP-using agents.
AlloyDB AI (preview) supports 10B+ vectors, 6× faster than standard PostgreSQL, processing 100k rows/second for less than 1/10th of a cent. The Open-source MCP Toolbox now integrates 40+ distinct databases.
Spanner Omni (preview) — downloadable Spanner edition that deploys beyond Google Cloud infrastructure. Mercado Libre's senior tech manager is quoted on cross-cloud resilience. Oracle Database@Google Cloud expanded to 20 global regions.

07

Research & Frontier Models: voice gets a face.

Two model announcements headlined this layer — both about conversation, not reasoning benchmarks. That's the tell about where Google thinks the next year's user expectations are heading.

Gemini Live API + Live Avatar (private preview) — the transition from audio-only to face-to-face multimodal AI. Native audio-to-audio reasoning synchronized with real-time video rendering. The framing: "a lifelike, expressive visual presence" instead of disembodied voice.
Gemini 3.1 Flash TTS (preview) — Google's most expressive text-to-speech model, with 200+ audio tags for steering pacing and expressiveness, supporting more than 70 languages. All outputs carry SynthID watermarking. The benchmark slide showed it leading the Artificial Analysis Text-to-Speech Arena Quality Elo at 1211 — narrowly beating ElevenLabs v3, Inworld TTS Max, MiniMax Speech 2.0 HD, and others.

Read these as a single play: by next year, the default support agent, the default training video, and the default product walkthrough will all be able to look at you and respond in your language. If your customer-experience roadmap doesn't have a voice/avatar lane, that's the gap to close.

08

AI Hypercomputer: the receipts under the receipts.

Every agent capability above eventually cashes out in compute, network, and storage. The AI Hypercomputer announcements are where Google made its loudest hardware noise — and the headline is the 8th-generation TPU, split for the first time into two distinct chips with two distinct jobs.

Fig 5. TPU 8 is two chips. TPU 8t is the training powerhouse — Google's claim is months-to-weeks for frontier-model training, with one superpod hitting 9,600 chips and 2 PB of shared high-bandwidth memory. TPU 8i is the inference engine — designed specifically for the agentic-workflow case where long-context decoding chokes on memory bandwidth. Read together: training and serving are now different products with different chips.

Around the TPUs, Google announced the supporting cast in the kind of detail that only matters to people running the workloads — but those are the people writing the checks:

Virgo Network — collapsed-fabric data center architecture with 4× the bandwidth of previous generations, connecting up to 134K TPUs into a single, non-blocking cluster.
Managed Lustre — now delivering 10 TB/s of bandwidth, claimed at 10× faster than last year and 20× faster than other hyperscalers for a single instance. Capacity scaled to 80 PB via C4NX instances and Hyperdisk Exapools.
Cloud Storage Rapid — Rapid Bucket and Rapid Cache. Native PyTorch and JAX integrations. Checkpoint writes 3.2× faster, restores 5× faster with Rapid Bucket.
Compute — new C4N series processing up to 95M packets/sec (40% faster than other hyperscalers, per Google), M4N series with Hyperdisk Extreme delivering 26.57 GiB RAM per vCPU and a 20% Oracle TCO reduction, Axion N4A Arm-based processors, Axion C4A.metal bare metal, H4D with Cloud RDMA, and pre-announcements for Z4D and Z4M.
GKE Agent Sandbox — gVisor kernel isolation (the same tech securing Gemini), launching up to 300 sandboxes per second per cluster, with 30% better price-performance than competitors when running AI agents.
GKE hypercluster (private GA) — single conformant GKE control plane managing millions of accelerators across 256,000 nodes spanning multiple GCP regions. GKE Pod Snapshots reduce pod start-up time by up to 81% for large models like Llama 3.2 70B and shrink the overprovision buffer by 92%.
Cloud Run — now serving up to 70B+ parameter models on serverless via NVIDIA RTX PRO 6000 Blackwell GPU, with full managed remote MCP server, Cloud Run Instances for long-running agents, and Cloud Run Sandboxes for isolated code execution.
Google Distributed Cloud — Gemini deployable in connected or fully air-gapped environments. Support for NVIDIA Blackwell B200/B300 GPUs, A4/M2/M3 machine families, 6 PB object storage per zone, and a new sovereign agentic AI architecture that keeps workflows entirely within the customer's secure organization boundary.
Networking — Agent Gateway as the "air-traffic controller" for agentic traffic, natively understanding MCP and A2A protocols. Cloud Network Insights for end-to-end visibility. GKE Inference Gateway with multi-region support, predictive latency boost, and disaggregated serving — Google's quoted result: "reduced Time to First Token (TTFT) latency by over 35% for Qwen3-Coder."

09

The launch-stage cheat sheet.

The deck uses three tags consistently — GA, Preview, and Pre-announcement — and they matter for sequencing. GA is now. Preview is months. Pre-announcement is "we want this on your roadmap, not yet on your contract." Here's the same content sorted by what you can actually deploy versus what you're committing your roadmap to:

Fig 6. Same announcements, sorted by what you can actually build with today. The GA column is the deal-grade list. The Preview column is your pilot list. The Pre-announcement column is your strategy-deck list.

10

The bottom line.

Stripped of the keynote choreography, Next '26 said three things that matter for any team building on Google Cloud over the next twelve months:

Vertex AI is now Gemini Enterprise Agent Platform. Update your slides, your statements of work, and your customer-facing decks. The capability set is broader than Vertex was, but every Vertex investment carries forward — Model Garden, Model Builder, and Agent Builder are folded inside.
Agents are governed objects now, not configurations. Agent Identity, Agent Registry, Agent Gateway, Agent Simulation, Agent Observability — these aren't features, they're a fleet-management posture. If you're proposing an agent-heavy architecture and your governance story is a sentence, your governance story is too short.
The infrastructure receipts are real, but most of them are pre-announced. TPU 8t/8i, Virgo Network, the new compute series — these are roadmap items, not GA hardware. Use them in strategy decks; build pilots on what's GA today (BigQuery Fluid Scaling, Knowledge Catalog, Workspace Intelligence, Wiz AI-APP, Triage Agent).

The competitive read: Google's strongest move at Next '26 was the unification under Gemini Enterprise — both as a brand and as an architecture. The story they're telling against Microsoft is no longer "we have better models" — it's "we are the only provider with first-party solutions across the entire AI stack." Whether that claim survives contact with a real M365-shop procurement cycle is the question every account team will be running into next quarter.

Read this with Door A and Door B. The bootcamp teaches your team to command agents. The primer explains what's under the agents. This door tells you what one of your three biggest partners is shipping — so when a client asks "what does your Google bench look like on agent governance?", you have something better than a brochure to point at.

Want the original 71 slides?

This recap reproduces the structure, claims, and customer stories from Google Cloud's official Next '26 deck. For the original — including embedded blog links, session videos, and the customer reference library — reach out to your Google alliance contact.

#04 · 4.a · Citizens · Human-in-the-Lead Training · Day 0 · May 2025

Before you build
your first agent.
The foundations.

Day 0 is the first day of the Agentic AI bootcamp — and the day everyone wishes they'd had before they started. 97 slides covering what an agent actually is, why "agentic" is more than marketing, the SPAR framework that anchors everything else, and the eleven topics that map onto the rest of the week. Run as a live track for Citizens; reusable as a foundations primer for any new team after them.

97 slides · taught live 11 topics on the agenda 5 agentic levels mapped 1 Citizens cohort

Curator Mo Nomeli · CAAI Global Lead AI Learning & Emerging Tech · Source: Intro to Agents — Day 0 · Citizens · Human-in-the-Lead Training · May 2025

01

"Is everything with an LLM an agent?"

That's the question Day 0 opens with — and it's the right one. Because the answer is no.

An LLM in a chat box is not an agent. An LLM that retrieves a document is not an agent. An LLM that calls a function is closer, but still not quite. The line between "calling an LLM" and "running an agent" is fuzzy enough that most teams build for months without agreeing on what they're building. Day 0 fixes that, in two moves: define the term, then place every system on a spectrum.

Once everyone in the room knows what counts as an agent — and what level of agent they're actually building — the rest of the week stops being a vocabulary fight and starts being engineering work.

02

Five levels of agentic.

Most "agents on the market" sit at Level 2 or 3. A few specialized systems reach Level 4 in narrow domains. Level 5 is hypothetical. Knowing the level you're at — and the level you're targeting — kills more debates than any other framework on Day 0.

Level 1

Rule-Based Automation

Fixed rules and workflows. Repetitive tasks like data entry or form processing. Like cruise control in a car.

No adaptability
Full human oversight required
Deterministic by design

Level 2

Intelligent Automation

ML, NLP, and computer vision processing unstructured data. Basic predictions. End-to-end automation, but inside rigid parameters.

More capable than Level 1
Still needs human supervision
Bounded by configured rules

Level 3

Agentic Systems

Plan, reason, generate across modalities. LLMs + memory + reinforcement learning. Customer support, financial analysis in digital domains.

Operates well within predefined boundaries
Struggles with novel/complex situations
Most enterprise agents today live here

Level 4

Semi-Autonomous Agentic Systems

Comparable to self-driving cars in mapped areas. Independently pursue goals, adapt strategies, manage workflows. Still needs domain constraints.

Adjusts based on feedback
Limited and defined domains only
The current frontier of production systems

Level 5

Fully Autonomous Systems

Hypothetical. Understands any goal, develops strategies, learns from experience, adapts across domains without human input. General AI.

Value-aligned decisions
Seamless cross-system integration
Not real yet — and possibly never

03

SPAR: the four-beat agent loop.

Once you know what level you're building at, you need a mental model for what an agent actually does. Day 0 uses SPAR — the simplest loop that captures every real agentic system.

The SPAR cycle — every agent runs this loop

S

Sense · Gather information, input, and context. Check what is needed to complete the task.

P

Plan · Think, analyze, map what approach fits the criteria. Outline specific steps to accomplish the goal.

A

Act · Execute the plan — usually requiring coordination across tools, assets, and action sequences in a defined environment.

R

React · Learn from experience. Reflect on results. Did the outcome meet the criteria? Did it satisfy the goal?

The integration of Sense → Plan → Act → React is the fundamental shift away from traditional automation. Linear scripts don't react. Agents do.

Throughout the rest of the week, every advanced topic — multi-agent systems, tool use, planning, evaluation — gets traced back to which beat of SPAR it lives in. That's the reason this framework comes first.

04

A single agent has five components.

Zoom into any agent — single or multi — and you'll find these five organs. Day 0 introduces them; the rest of the week deep-dives each one.

The five-component anatomy

Component	What it does	Where the rest of the week goes
Profile & Persona	Who is this agent? What role does it play? What rubric or grounding defines its voice?	Day 0 covers profile generation: human-crafted vs LLM-generated vs data-generated.
Action & Tool Use	What can the agent do? Which APIs, scripts, knowledge bases, and external systems can it reach?	Tool Use deep-dive (slides 66-96). RAISE framework, the Detective's Dilemma, tool overload.
Knowledge & Memory	What does the agent retain beyond the immediate chat? Other agent conversations, API instructions, domain knowledge.	Embeddings, RAG, knowledge graphs — covered later in the week.
Reasoning & Evaluation	Zero-shot, few-shot, chain-of-thought, tree-of-thought. Plus self-consistency and LLM-as-judge for evaluation.	Reasoning + benchmarking sessions later in the week.
Planning & Feedback	Single-path (chain-of-thought) vs multi-path (tree-of-thought). Planning with vs without human feedback.	Planning gets its own deep-dive. Feedback threads through Privacy/Safety/Ethics.

05

The core agent cycle.

SPAR is the abstract loop. The core agent cycle is what it looks like when you actually instrument it with software components.

1

Perception

The agent receives and interprets incoming requests — text, voice, API calls — and extracts user intent.
2

Reasoning

It analyzes the collected information, identifies patterns, and formulates a plan. Evaluates options and seeks clarification when needed.
3

Action

The agent executes the plan: retrieves data, generates a response, triggers external scripts, calls tools.
4

Observing & Learning

It assesses results, refines its approach for future tasks, and logs new knowledge or mistakes — feeding the loop back into supervised, unsupervised, or reinforced learning.

06

Tool use, taught through a crime scene.

The longest section of Day 0 — about 30 slides — is on tool use. The teaching frame is "The Detective's Dilemma": you're a detective with too many tools, the wrong tools, or no tools at all. Sound like your AI agent project?

RAISE Framework

The four parts of an agent's tool ecosystem.

Controller — the dialogue + LLM core that decides what to do next
Working Memory — system prompt, task instructions, conversation history, scratchpad
Tool Pool — databases, scripting, interpreters, knowledge bases, external AI tools
Example Pool — <Q, A> pairs the agent can retrieve from when planning

The Tool Use Lifecycle

From request to result.

Query arrives → Controller parses
Retrieve relevant examples from Example Pool
Plan actions, write to Working Memory
Execute against the Tool Pool, observe results
Loop until the goal is met or escalation triggered

Mo Tools, Mo Problems

The minimalism principle.

Avoid tool overload. Each tool added increases the agent's choice-set exponentially.
How agents see tools. Tools are not menus — they're descriptions the LLM has to understand.
Tool resilience. Tools fail. Plan for failure modes from day one.
Bridge tooling. Sometimes you need a tool to call a tool to call a tool. Sometimes you shouldn't.

07

What goes wrong (and why).

Day 0 names the failure modes early so the rest of the week can focus on countermeasures. Eight categories show up over and over in real production systems.

Challenge	What it looks like
Technical Barriers	Programming expertise limits adoption. Fragmented architectures hinder scaling.
Trust & Transparency	Decision visibility is limited. Why did the agent do that? Often unanswerable.
Data & Model Dependency	Flawed data propagates errors through every downstream agent action.
Coordination Complexity	Multi-agent collaboration bottlenecks become increasingly difficult as you scale.
Non-Determinism	Unpredictability causes cascading errors. Same input, different output.
Limited Customization	Rigid templates limit adaptation to specific business contexts.
Integration & Scalability	Plugging into existing enterprise systems is harder than the demos suggest.
Ethical Risks	Autonomy introduces trust issues. Who's responsible when the agent acts wrong?

08

What we tell teams on Day 0.

Day 0 closes with concrete advice: ten best practices distilled from production deployments, plus a tour of the platform landscape teams will actually pick from.

The ten Day 0 best practices

Build discipline

Foundations

Start simple. MVP-first, basic planning, no premature complexity
Clear success criteria. Define specific goals upfront
Constrained environments. Develop in controlled settings to manage non-determinism
Leverage existing tools. Reuse, don't reinvent

Operating posture

Production

Robust orchestration. Strong management for agent collaboration
Performance optimization. Real-time efficiency matters
Continuous improvement. Feedback loops for refinement
Security & control. Limit web access, enforce auth

Trust posture

Ethics

Ethical AI solutions. Align with standards, preserve human dignity
Trust + transparency. Decision logs, evidence trails
Closes the Day 0 loop — sets up the Privacy/Safety/Ethics deep-dive later in the week

The platform landscape — what teams will actually pick from

Platform	Strength	Watch-out
LangChain	Flexible LLM workflows, modular, large community.	Developer-focused; higher technical barrier.
CrewAI	Multi-agent collaboration with task-based roles. Code + visual.	Effective for "crews" but can be opinionated.
AutoGPT	Low-code, drag-and-drop visual editor for continuous agents.	Can be challenging to set up reliably.
SuperAgent	Open-source framework + cloud platform, optimized for fast iteration.	Developer-centric; lacks visual builder.
MetaGPT	Simulates a "development team" to generate full-stack prototypes.	Niche focus on software development specifically.
CAMEL	Communication and negotiation between agents for adaptive decisions.	Primarily research-grade.

09

What Day 0 sets up.

Day 0 isn't about building anything. It's about arriving on Day 1 with the same vocabulary, the same mental model, and the same definition of "agent" as everyone else in the room.

From here the program goes deeper across the three live days of the Citizens AI Academy Track C (September 2025). Day 1 is Intro to Agents + Reinventing Banking. Day 2 is Tool Use + Reasoning. Day 3 is Memory + Planning + the A.G.E.N.T design framework. Each day builds on the SPAR cycle and the five-component anatomy you just learned.

Ready to run Day 0 with your team?

The full deck — all 97 slides, including diagrams, agenda, the SPAR walkthrough, the Detective's Dilemma narrative, and the platform landscape — is available for download. The same content has been delivered live to Citizens; reach out to discuss running it for your team.

Download the Day 0 deck ↓ Talk to Mo →

#04 · 4.a · Citizens AI Academy · Track C · Day 1 · September 2025

From prompts
to agency.

Day 1 is where the cohort moves from "calling an LLM" to "running an agent." Three sessions in the morning — Intro to Agents, Understanding Agents, Reinventing Banking with Agents — close out with a live KYC multi-agent demo on Accenture's AI Refinery. Then the Pod runs Hypersprint #1 against the real Citizens backlog.

107 slides · taught live 3 sessions in the morning 1 KYC demo · AI Refinery 1 Hypersprint vs Pod backlog

Curator Mo Nomeli · CAAI Global Lead AI Learning & Emerging Tech · Source: Citizens AI Academy · Track C · Day 1 · September 2025

01

"Find me the best mortgage."

Day 1 opens with a banking scenario that lands harder than the generic "book a vacation" example. You are a customer. You want a mortgage on the house at 123 Main St. Lowest rate. Close in 25 days. The bank's digital assistant builds you a perfect plan in seconds — partner lender, rate sheet, document checklist, timeline.

Then reality hits. The promotional rate expired yesterday. Your "verified funds" sit behind a 3-day settlement period. The recommended insurer doesn't cover your flood zone. Now the customer does the real work — manually hunting for new rates, scrambling to liquidate, finding a different insurer. The plan looked perfect because it never had to operate in the real world.

That's the gap Day 1 names: between generative AI (which gives you a plan) and agentic AI (which can execute the plan, react when reality doesn't match, and finish the job). For a Citizens cohort, this isn't theoretical — it's the difference between a chatbot that sounds smart and an agent that actually closes the loan.

02

Agentic AI vs traditional AI vs chatbots.

Day 1 makes the team draw the lines clearly. Otherwise the rest of the week becomes a vocabulary fight.

Three categories — what each one actually does

Category	What it does	Banking example
Traditional AI / ML	Single prediction or classification. Stateless. Same input → same output.	A fraud-scoring model that returns a 0–1 risk score on a transaction.
Chatbot (GenAI)	Generates text. Can converse. No memory across sessions. No actions in external systems.	A customer-service bot that answers FAQ but can't actually unlock your account.
Agentic AI	Generates a plan, executes it via tools, observes results, adapts. Has goals, memory, and the ability to act in external systems.	An onboarding agent that pulls KYC docs, validates them, runs sanctions screening, and only escalates the edge cases to a human.

03

SPAR — the anchor, taught again.

SPAR is taught on Day 0 as the foundations. Day 1 brings it back as the working frame for the rest of the week. Every later concept — Tool Use, Reasoning, Memory, Planning, Multi-Agent — maps back to one or more SPAR phases.

SPAR · the four-phase agent loop

S

Sense · gather information, input, context. Check what's needed to complete the task.

P

Plan · think, analyze, map an approach. Outline specific steps to accomplish the goal.

A

Act · execute. Coordinate across tools, assets, action sequences in a defined environment.

R

React · learn from experience. Reflect on results. Did the outcome meet the criteria?

The integration of Sense → Plan → Act → React is the fundamental shift away from traditional automation. Linear scripts don't react. Agents do.

04

Five levels of agentic — placed on a banking map.

The Agentic Progression Framework runs Levels 1 through 5. Most production banking systems live at Level 2 or 3. Knowing which level you're targeting kills more debates than any other framework on Day 1.

Level 1

Rule-Based Automation

Fixed rules and workflows. Repetitive tasks like data entry, form processing. Like cruise control.

No adaptability
Full human oversight
Banking parallel: if/else fraud rules

Level 2

Intelligent Automation

ML, NLP, computer vision processing unstructured data. Basic predictions inside rigid parameters.

More capable than Level 1
Still needs human supervision
Banking parallel: document classification on KYC

Level 3

Agentic Systems

Plan, reason, generate across modalities. LLMs + memory + reinforcement learning.

Operates well within predefined boundaries
Struggles with novel situations
Most enterprise agents today live here

Level 4

Semi-Autonomous

Comparable to self-driving cars in mapped areas. Independently pursue goals, adapt strategies, manage workflows.

Adjusts based on feedback
Limited and defined domains only
The current frontier of production

Level 5

Fully Autonomous

Hypothetical. Understands any goal, develops strategies, learns from experience, adapts across domains without human input.

Value-aligned decisions
Seamless cross-system integration
Not real yet — and possibly never

05

Three ways agents collaborate.

The afternoon "Understanding Agents" session adds a frame for what comes later in the week: how multiple agents work together. Three patterns — each with a banking equivalent.

Pattern 1

Centralized

One orchestrator agent at the top, all decisions and routing flow through it. Specialists below execute.

Easy to reason about
Single point of failure
Banking parallel: a Loan Origination Manager calling out to credit-check, valuation, and KYC sub-agents

Pattern 2

Decentralized

Peer agents communicate directly. No top-down router. Coordination via shared protocol or message bus.

More resilient
Harder to audit
Banking parallel: peer fraud-detection agents sharing flags across regions

Pattern 3

Hierarchical

A tree. Top-level coordinator, sub-orchestrators, leaf specialists. Decisions cascade through tiers.

Scales to complex workflows
More moving parts to test
Banking parallel: regulatory reporting where region → product → entity all roll up

Open standards matter here. The protocols (MCP, A2A) that let these patterns work without each agent inventing its own dialect are taught later in the program.

06

Reinventing banking — where agents land.

The afternoon track maps where agentic AI actually lands in financial services. Six functions, each with a value proposition the cohort can take back to their Pod.

Function	Where the agent lives	Value delivered
Sales & Service (Banking)	Quick access to product info, contextual recommendations, account servicing.	Greater efficiency · faster response · increased accuracy.
Client Servicing (Capital Markets)	Real-time insights and recommendations across investment strategies.	Enhanced client satisfaction · competitive advantage.
Fraud Detection (Payments)	Pre-emptive fraud detection across channels.	Improved fraud protection · enhanced customer experience.
Claims (Insurance)	Claims-processing automation, document collection.	Improved workflows · streamlined documents.
Risk & Underwriting	Effective underwriting, proactive risk assessment vs reactive remediation.	Reduced risk · better data protection · faster processing.
Technology Development	Streamlined software development, code generation, test scaffolding.	Improved workflow · increased efficiency · shorter dev cycles.

07

The KYC and AML deep dive.

Two banking workflows get the deep treatment on Day 1: Anti-Money Laundering screening and Know-Your-Customer onboarding. Both are high-volume, high-stakes, and well-suited to a Level-3 agentic system.

AML / Sanctions

Revolutionizing alert adjudication

Automate high-volume sanctions, PEP, and adverse-media alert screening 24/7 with high consistency
Agents adjudicate initial alerts, mimicking expert analysts' escalation decisions
Generative AI inside agents drafts initial SAR narratives, aiding investigators
Manages rising alert volumes without proportional staff increases

KYC / KYB

Streamlining customer onboarding

Automate data gathering and verification from diverse sources during onboarding
Intelligent document processing extracts and validates info for due diligence
Deeper risk insights by analyzing complex ownership structures, multi-source screening
Continuous, agent-driven monitoring ensures ongoing compliance and timely risk reassessment

08

The road-ahead reality check.

Day 1 doesn't close on hype. It closes on the operational risks the cohort needs to keep front of mind for the rest of the week.

Risk	What it looks like in banking
Data, talent, integration	Most production agentic systems stall on data quality, scarce ML/AI talent, or integration with legacy core-banking systems — not on model capability.
Regulatory horizon	Banking regulators expect explainability, decision-trail audits, and clear human accountability. Agents that can't show their work fail audit.
Trust & transparency	Why did the agent decide that? If the answer is "because the LLM said so," you have a problem. Decision logs are non-negotiable.
Ethical & operational	Bias propagation in credit decisions. Hallucinated SAR narratives. Customers with no clear path to dispute an agent's decision.
Job impact	Agents augment investigators and analysts more than they replace them. The Day 1 framing: "agents handle the volume; humans handle the judgment."

09

What Day 1 sets up.

By the end of Day 1, the cohort has a shared vocabulary (agentic vs GenAI vs traditional ML), a shared frame (SPAR), a shared map (the 5 levels, the 3 collaboration patterns), and a shared business case (KYC and AML, demoed live).

From here the bootcamp goes deeper into each capability. Day 2 attacks tool use and reasoning. Day 3 attacks memory and planning. Each day builds on the SPAR cycle the team locked in today.

Ready to run Day 1 with your team?

The full deck — all 107 slides, including the mortgage hook, the SPAR walkthrough, the 5-level framework, the 3 collaboration patterns, and the AI Refinery KYC multi-agent demo — is available for download. The same content was delivered live to the Citizens cohort in September 2025; reach out to discuss running it for yours.

Download the Day 1 deck ↓ Talk to Mo →

#04 · 4.a · Citizens AI Academy · Track C · Day 2 · September 2025

Tools.
And the power of pause.

Day 2 is two deep dives. Tool Use with Agents in the morning — the Detective's Dilemma, the RAISE framework, "Mo Tools, Mo Problems" minimalism, progressive tool access. Reasoning with Agents in the afternoon — fast vs slow thinking, LLMs vs LRMs, multi-agent reasoning, metacognitive awareness. Hypersprint #2 begins after lunch.

91 slides · taught live 2 deep dives · tools + reasoning 1 RAISE framework · operationalized 1 Hypersprint #2 launch

Curator Mo Nomeli · CAAI Global Lead AI Learning & Emerging Tech · Source: Citizens AI Academy · Track C · Day 2 · September 2025

01

Why tools matter — the building blocks of action.

Day 2 opens by tying tools back to the agentic levels from Day 1. Level 1 is a switch statement. Level 2 introduces criteria and decision-making about which tool to call. Level 3 is where the agent actually orchestrates multiple tools — figuring out the order, handling dependencies, making the calls.

The frame: tools are the bridge between abstract goals and tangible outcomes. An agent without tools is a chatbot with goals and no hands. An agent with tools can move money, file SARs, update CRM records, send compliance notifications. Tools are what turn "could" into "did."

And the limit: an agent is bounded by its understanding of the tools' capabilities, when to use them, and how to use them effectively. This is why Day 2 spends a third of its time on tool design — because tool design is agent design.

02

The Detective's Dilemma — taught with banking.

Day 2's central narrative is "the Detective's Dilemma." Picture a banking representative preparing an enhanced-due-diligence reply for a KYC review. The LLM has been trained on Citizens' policies. It outlines internal procedures. It drafts a template response. It explains itself.

And then it stops. Because outlining procedures is not the same as performing them. The agent needs tools — to actually pull the income docs, run sanctions screening, log the case, generate the SAR. Without tools, the LLM is a detective who knows the case backwards and forwards but can't open the evidence locker.

03

"Mo Tools, Mo Problems" — the access paradox.

More tools = more capability. More tools = more failure modes. Day 2 names this paradox directly so the cohort doesn't fall into it.

Take a hypothetical agent with three well-described tools, each with high resilience and detailed descriptions:

The ability to send emails
The ability to query a customer-service database (with access controls scoped to that customer's history)
A connection to the data lake to populate prioritized issues

In theory: the agent can find novel issues in customer-service calls and notify the right authority. Any foreseeable problems? Yes — many. The agent could email the wrong recipient. It could surface a false positive that triggers an investigation. It could inadvertently expose customer data through an over-broad query. Each new tool added increases the failure surface multiplicatively, not additively.

04

The RAISE framework — operationalized.

Day 2 spends real time inside RAISE — the framework that defines an agent's tool ecosystem. Built on top of the ReAct method (Reason + Act in a loop), RAISE adds a memory mechanism that mirrors human short-term + long-term memory.

Component 1

Controller

The dialogue + LLM core. Decides what to do next based on the current task plan and the contents of working memory.

Reads the prompt + history
Generates the next action
Parses tool outputs into observations

Component 2

Working Memory

Short-term scratchpad for the current task. System prompt, task instruction, conversation history, retrieved examples, task trajectory.

Resets per task (or per session)
Bounded by context window
Where the agent's "thinking out loud" lives

Component 3

Tool Pool

Databases, scripting interpreters, knowledge bases, external AI services — the things the agent can actually call.

Each tool has an input/output spec
Each tool has a description the LLM reads
Tool errors flow back as observations

Component 4

Example Pool

A library of past <Q, A> pairs the agent can retrieve from when planning. The agent's long-term reference.

Retrieved on prompt
Injected into working memory
The "I've seen this before" mechanism

RAISE in action — the agentic loop

1

Query arrives → Controller parses intent and writes task plan to Working Memory.

2

Retrieve relevant examples from the Example Pool → injected into Working Memory.

3

Plan actions, write thought to scratchpad, execute against the Tool Pool.

4

Observe results, update Working Memory, loop until goal met or escalation triggered.

RAISE is the operating model for Level-3 banking agents. The Day 2 lab finds the "tool internal monologue" in the running code — making the loop visible.

05

Tools fail. Plan for it.

Day 2 ends the tool track with the operational reality: tools fail. APIs go down. Data is stale. Calls time out. The agent has to be designed for resilience from day one, not as an afterthought.

Strategy 1

Tool Resilience

Build retry logic, fallback paths, and graceful degradation into every tool wrapper. An agent with a flaky API should know to wait, retry, or escalate — not silently fail.

Strategy 2

Progressive Tool Access

Don't give a new agent the keys to everything on day one. Start with read-only access. Then read-write to a sandbox. Then read-write to production with human approval. Then unattended.

Strategy 3

Test, test, test

Adversarial scenarios. Tool-failure simulations. Edge cases. Production agents that have never failed in testing will fail in production. Better to fail in the lab.

06

Reasoning — fast and slow.

The afternoon shifts from "what tools" to "how the agent thinks." Day 2 leans on Daniel Kahneman's two-systems framing, which makes the architectural choice tangible.

System	How it operates	Banking analog
System 1 — Fast	Quick, automatic, pattern-matched. Little effort. The "snap judgment" mode.	Real-time fraud rules — millisecond decisions on transaction approval.
System 2 — Slow	Deliberate, reasoned, multi-step. Like planning a chess move. Higher latency, higher accuracy on novel problems.	Multi-step fraud-pattern investigation across an account history; SAR drafting.

The Day 2 lesson: combine both. Fast checks for routine cases (low latency, high consistency). Slow reasoning for edge cases (high latency, accepted because the case warranted it). Imagine rerouting a $1.2M pharmaceutical shipment to avoid a storm — only to cross routes that violate international transport regulations. That's a System 1 mistake. The Day 2 framing for banking: "think carefully, deeply, and reason thoroughly" — but only when the case earns it.

07

LLMs vs LRMs — the power of pause.

Day 2 introduces Large Reasoning Models as a distinct category from Large Language Models. Both look the same from the API, but they're trained differently and behave differently.

Characteristic	Large Language Models (LLMs)	Large Reasoning Models (LRMs)
Training Data	Vast unstructured text corpora.	Structured data + explicit reasoning frameworks.
Reasoning Depth	Surface-level, statistical pattern-matching.	Causal relationships, systematic analysis.
Adaptability	Generalizes broadly across language tasks.	Specializes narrowly in technical / logic-heavy domains.
Key Strength	Translation, summarization, dialogue.	Math, coding, multi-step decision-making.
Output Type	Probabilistic text outputs.	Deterministic logical conclusions.

The compute model is also different. LLMs got better via train-time compute scaling — more data, more parameters. That curve is hitting limits (finite data, finite compute). LRMs scale via test-time compute — letting the model think longer at inference, exploring more reasoning paths. The "power of pause" is the model spending more inference tokens on hard problems.

08

Many small reasoners beat one big one.

The Day 2 reasoning track ends with a counterintuitive finding from recent research: collaborative debate frameworks of smaller models can exceed the reasoning capacity of a single large LLM — at a fraction of the cost.

Single Reasoner

Scale test-time compute

Give one strong reasoner more inference tokens. Use prompts that elicit deep thinking ("what factors might make this recommendation unreliable?"). Strong baseline.

Debate

Two reasoners, one truth

Even with smaller language models (SLMs), debate frameworks can exceed LLM performance at a 14x cost factor. Diverse perspectives challenge each model's reasoning.

Multi-Agent

Many small + diverse

Smaller, more diverse models with reasoning capabilities. Scale wide instead of scale up. Individually limited; collectively they surpass each other as a team operating at different "thinking" speeds.

Metacognitive awareness is the new horizon. LRMs are starting to surface their own uncertainty — "progress is being made but we need to reconcile these discrepancies." Recognizing uncertainty is the prerequisite for Human-in-the-Loop escalation. When the agent can flag its own confusion, the human review path has a clear trigger. That's the holy grail of explainability and observability rolled into one.

09

What Day 2 sets up.

By the end of Day 2, the cohort has both the action layer (tools, RAISE, progressive access, resilience) and the thinking layer (LLMs, LRMs, fast/slow, multi-agent reasoning) for what they're going to build.

Day 3 brings memory and planning — what the agent knows and how it decides what to do next. The team will need both in their Hypersprint #2 work.

Ready to run Day 2 with your team?

The full deck — all 91 slides, including the Detective's Dilemma, the RAISE framework, "Mo Tools, Mo Problems," progressive tool access, the LLM/LRM comparison, and the multi-agent debate research — is available for download.

Download the Day 2 deck ↓ Talk to Mo →

#04 · 4.a · Citizens AI Academy · Track C · Day 3 · September 2025

Memory.
Planning.
A.G.E.N.T

Day 3 is the structural day. Morning: Memory in Agents — the three layers, context windows, long-term storage, feedback loops. Afternoon: Planning Agentic Workflows — when to use agents, when not to, the Three Circles of Opportunity, and the A.G.E.N.T design framework the cohort will use for every agent they build.

122 slides · taught live 3 memory layers 5 A.G.E.N.T components 3 Circles of Opportunity

Curator Mo Nomeli · CAAI Global Lead AI Learning & Emerging Tech · Source: Citizens AI Academy · Track C · Day 3 · September 2025

01

Memory isn't recording — it's reconstruction.

Day 3 opens with a thought experiment. Think back to a fond memory. What were the sounds? The smells? The conversations in the background? Who was there?

And then the trick: are you remembering the event itself, or your last retelling of it? Most "memories" are actually reconstructions — built from fragments, refined each time you recall them. Memory isn't a recording. It's a story we keep rewriting.

That's the framing for the agent's memory architecture. An agent's memory isn't a transcript of everything it has seen. It's a curated, structured, prioritized representation of what mattered. The Day 3 task: design that curation deliberately, because if you don't, the LLM's context window will do it for you — badly.

02

The three layers of agent memory.

Day 3's core memory model has three layers. Each one solves a different problem; together they make agents that actually learn.

Layer 1

Short-Term Memory

The agent's working scratchpad. Recent interactions, current task context. Ensures contextual continuity within a single session.

Lives in the LLM's context window
Bounded by token limits
Resets between sessions

Layer 2

Long-Term Memory

Persistent storage beyond the session. User preferences, past interactions, workflows, domain-specific knowledge.

Vector stores, knowledge graphs, relational DBs
Retrieved on demand into working memory
Where the agent gets continuity

Layer 3

Feedback Loops

The mechanism that keeps memory useful over time. Refines both short-term and long-term memory, prunes stale info, reinforces what works.

Human-in-the-loop ratings
Outcome-based reinforcement
Memory consolidation: turning experience into knowledge

03

Short-term memory — the context window.

Day 3's short-term-memory section uses a concrete metaphor: picture yourself at a busy intersection in London. The cars are documents. Should you pay attention to pedestrians, red buses, or taxis? Multiple databases are firing queries into the context window at once. Did the model focus on what its scope was? What if it missed the queen walking by?

Real-world impact of short-term memory choices

Decision	What it controls	Banking-stakes failure mode
Context window size	How much text the model can process at once. Newer models (Llama 4, GPT-5) support millions of tokens.	Larger windows can impact performance — model attention degrades. Stuffing more in isn't always better.
Token management	Which tokens to keep, which to summarize, which to evict from the active context.	Critical KYC document evicted to make room for chitchat → due-diligence error.
Landmark events	Tagged moments in the conversation the agent must remember regardless of token pressure.	Customer's stated risk tolerance gets buried in transcript noise → agent recommends an unsuitable product.
Attention mechanisms	How the model weights different parts of the context when generating output.	Recency bias overwhelms historical context → recent transactions dominate fraud assessment.

The framing the cohort takes home: guide short-term memory toward better outcomes. Highlight important info using repetition, clear statements, or explicit tags like <<IMPORTANT>>. In project management, key milestones, decisions, and challenges should be clearly noted without unnecessary detail. The agent reads what you tell it to read — engineer the prompt accordingly.

04

Long-term memory — and why banking needs it.

Day 3 makes the business case for long-term memory with a concrete banking failure pattern: customer uses airline Wi-Fi when logging into the banking portal. The portal flags as unusual login activity from unsecure Wi-Fi. The account is locked. The customer calls and walks through a long process to unlock.

With long-term memory? The agent remembers this customer travels for work, has been to airports 47 times this year, and uses unsecure Wi-Fi in 31% of sessions without incident. The flag never fires. The customer never calls.

Why LTM matters for banking processes

Customer Outcomes

20-30%

Higher customer satisfaction through personalized, natural interactions
Customer interactions build on past experiences
Advisors understand preferences and solve issues smoothly

Operational Quality

50%+

Reduction in error rates (per businesses adopting LTM-enabled AI)
Build data on processes
Longitudinal study of interactions, pain-points, friction

Where Current LLMs Fall Short

Today

Each session is amnesia by default
No native preference recall
Context window ≠ long-term memory

05

Designing long-term memory — five steps.

Day 3 walks the cohort through a five-step build process for LTM. By the end, every Pod has a vocabulary for talking about how their agent remembers things.

1

Select a framework

LangGraph for graph-based memory and orchestration. CrewAI for memory inside multi-agent crews. LangChain for episodic, semantic, and procedural memory modules. LlamaIndex for knowledge-base management.
2

Define memory requirements

What needs to persist? What can be reconstructed on demand? What's transient? Categorize as events, facts, or how-to memories.
3

Build retrieval mechanisms

Vector search (Pinecone) for semantic retrieval. Relational stores for structured data. Graphs (Neo4j) for relationship-heavy queries. Tag everything for explicit retrieval paths.
4

Implement memory consolidation

How does experience become knowledge? Summarization, landmark tagging, periodic distillation. Without consolidation, your LTM becomes a write-only log.
5

Integrate memory with agent reasoning

Memory only helps if the agent uses it. Wire the retrieval calls into the reasoning loop. Make the agent's memory visible in its scratchpad.

06

Feedback loops — and the SAFELOOP discipline.

Day 3 closes its memory section with feedback loops — the third memory layer, and the most operationally risky. LLMs can over-optimize specific metrics through feedback loops, missing the broader context. Without discipline, feedback loops cause behavior drift.

The Day 3 mnemonic — feedback with human oversight — spells out the discipline:

Letter	Practice	Why it matters
S — Supervision	Human oversight prevents unintended outcomes.	Without it, the agent optimizes for the wrong proxy.
A — Alignment	Loops should enhance capabilities while staying ethical.	Performance gains that violate policy are losses.
F — Foresight	Anticipate risks and design carefully.	Most feedback-loop failures are foreseeable.
E — Examination	Regular audits ensure accuracy and catch behavior drift.	Drift is gradual; audits are how you catch it.
L — Limits	Guard against over-optimization of narrow metrics.	Goodhart's Law: when a measure becomes a target, it ceases to be a good measure.
O — Oversight	Vigilant monitoring is non-negotiable.	Production isn't lab. Real users break things lab tests miss.
O — Outcomes	Measure success against broad goals, not just metrics.	Customer satisfaction beats response-time-99th-percentile.
P — Practice	Real-world success requires disciplined implementation.	Discipline is the differentiator.

07

When to use agents — and when not to.

The afternoon shifts from memory to planning. Before the cohort designs an agent, they need to know whether the use case deserves one. The Day 3 framework: The Three Circles of Agentic Opportunity.

Circle 1

Effort — is it worth it?

Practical, straightforward process
Team is ready and willing to adapt
You can start small and scale up
Potential benefits justify investment
Implement without disrupting core operations

Circle 2

Feasibility — can it be done?

Tasks follow clear, consistent rules and repeatable steps
Data and processes are organized and accessible
AI can produce reliable, verifiable outcomes before human review

Circle 3

High Impact — will it matter?

Automating tasks boosts efficiency and frees up skilled workers
Prioritize repetitive, time-consuming tasks like data entry and reporting
Automation should align with strategic goals, not just convenience

The sweet spot is the intersection of all three circles — high-value, feasible, efficient to automate, and the kind of task teams frequently complain about. Day 3's "Agentic AI Prioritization Metric" is a 2x2 the cohort actually votes on:

Quadrant	What it is	What to do
High Impact, Low Complexity	Quick Wins.	Your ideal agentic opportunity. Build this first.
High Impact, High Complexity	Strategic Projects.	Future opportunities requiring careful planning.
Low Impact, Low Complexity	Low Priority.	Nice-to-have agents. Defer.
Low Impact, High Complexity	Avoid.	Not worth the effort.

08

A.G.E.N.T — the design framework.

The capstone of Day 3 is the A.G.E.N.T framework — the design checklist every Citizens Pod will run on every agent they build for the rest of the week (and beyond). Five components, five questions.

Component	Key Question	Key Elements	Actionable Steps
A — Agent Identity	Who is the agent?	Purpose, role, scope.	Craft a clear mission. Outline responsibilities and limits. Align design with goals.
G — Gear & Brain	What powers the agent?	AI model, tools, knowledge sources.	Select a model balancing performance + cost. Integrate the right tools/APIs. Build accurate knowledge sources.
E — Execution & Workflow	How does the agent work?	Input/output, workflow design, triggers, automation.	Define data formats. Map workflows. Set triggers to launch actions.
N — Navigation & Rules	How does the agent decide?	Processing rules, safety mechanisms, transparency.	Filtering and prioritization rules. Rate limits, circuit breakers, escalation paths. Decision logs.
T — Testing & Trust	How do we improve and scale?	Real-world testing, feedback, monitoring, scalability.	Run real-world scenarios. Collect feedback and track performance. Plan for growth.

09

What Day 3 sets up.

By the end of Day 3, the cohort has a memory architecture (3 layers + 5 LTM steps + SAFELOOP discipline), a prioritization model (3 circles + 2x2 quadrants), and a design framework (A.G.E.N.T) — everything they need to scope, design, and trust an agent end-to-end.

Day 3 closes the curriculum arc taught live to the September 2025 Citizens cohort. Days 4 and 5 of the Academy continued with Multi-Agent Orchestration, Scaling, Evaluation, Guardrails, and the Agentic Case Study — covered later in the bootcamp series as those modules are written up.

Ready to run Day 3 with your team?

The full deck — all 122 slides, including the 3-layer memory model, the 5-step LTM build, the SAFELOOP discipline, the Three Circles of Opportunity, the prioritization 2x2, and the complete A.G.E.N.T framework — is available for download.

Download the Day 3 deck ↓ Talk to Mo →

#04 · 4.a · Citizens Spotlight · Human-in-the-Lead Training · May 2025

Five days.
One Citizens cohort.
Humans in the lead.

Human-in-the-Lead Training — a live, multi-day agentic AI program delivered for Citizens, built on a simple premise: humans stay in command of the agents, not the other way around. Four modules: Day 0 — Intro to Agents (the May 2025 foundations preview), then the three live days of the September 2025 Citizens AI Academy · Track C — Banking Reinvention, Tool Use & Reasoning, Memory & Planning. Pick a day. Read what was actually taught.

4 modules · Day 0 + Days 1–3 4 modules · all written up 417 slides · across 4 modules 1 Citizens cohort · Track C

Curator Mo Nomeli · CAAI Global Lead AI Learning & Emerging Tech · Source: Human-in-the-Lead Training · Citizens · 5-day curriculum · May 2025

Pick a day

Foundations → Deep dives → Capstone

01

What "Day 0" actually means.

Most agentic AI training jumps straight to "build something." That's the wrong starting point. Day 0 is the day before the building starts — when the team agrees on what an agent is, what level of autonomy they're targeting, and what mental model they'll use for the next four days.

If Day 0 lands, every later day compounds on it. If Day 0 is skipped, every later day re-litigates the same vocabulary fights — and the curriculum slows to a crawl. Hence: Day 0 first. Always.

Days 1, 2, and 3 take the foundations and go deep. Day 1 is Intro to Agents + Reinventing Banking with Agents (with a live KYC multi-agent demo on AI Refinery). Day 2 is Tool Use + Reasoning (RAISE, "Mo Tools Mo Problems," LRMs and the power of pause). Day 3 is Memory + Planning + the A.G.E.N.T design framework. Same curriculum, taught live to the Citizens Track C cohort in September 2025.

#06 · AI Refinery 101 · By Accenture

Stop Googling.
Start shipping.

Every team building agents has the same problem: scattered docs, partner-by-partner learning curves, and a brand-new agent harness re-invented every quarter. AI Refinery™ by Accenture is the platform we built to make that problem go away — one place to develop and execute AI multi-agent solutions, with the agents, models, memory, governance, safety, and APIs already wired together. This is the 101.

12 utility agents 12 huddle partners 8 model types 10 API surfaces

Source AI Refinery 101 · Accenture AI Refinery SDK · sdk.airefinery.accenture.com · Live documentation

01

The engineer's problem.

If you've shipped an agent in the last twelve months, you know the drill. Pick a model. Wire a vector store. Bolt on a tool-calling layer. Wrap it in something that looks like memory. Add guardrails. Add evals. Add an orchestrator. Hope it doesn't break. Then watch the next team start over from scratch.

The market gives you ingredients. What you actually want is a kitchen.

That's what AI Refinery is. It's a platform — not a framework, not a wrapper, not a "starter kit" — for developing and executing AI multi-agent solutions. Three things it's designed to help you do, straight from the docs:

Adopt and customize large language models (LLMs) to meet specific business needs.
Integrate generative AI across various enterprise functions using a robust AI stack.
Foster continuous innovation with minimal human intervention.

Seamless integration. Ongoing advancements. The platform isn't trying to be every framework. It's trying to be the substrate that the rest of your agentic stack builds on. One reference. One environment. One toolkit your team actually uses.

02

The four pillars.

Everything in AI Refinery hangs off four load-bearing capabilities. Get these right and the rest follows.

Fig 1. The four pillars. Together they form the substrate every agentic application built on AI Refinery rides on top of.

Pillar 1

Flexible Agentic Teams

Enable agents to autonomously perform tasks
Make decisions and interact with other agents and systems
Composable teams — not isolated agents

Pillar 2

Comprehensive Model Catalog

LLMs, VLLMs, rerankers, and more
Choose models to power your agents
Available through agentic workflow or direct API calls

Pillar 3

Scalable Distiller Framework

Designed to streamline complex workflows
Orchestrates various agents handling different tasks
The connective tissue between everything else

Pillar 4

Agent Memory

Retain context across interactions
Personalize interactions per user
Provide coherent responses over time

03

Twelve utility agents. Ready to deploy.

Built-in utility agents are the workhorses — engineered to streamline tasks like Retrieval-Augmented Generation (RAG), data analytics, and image generation. Ready-to-deploy. Configure with YAML. Deploy with minimal Python. Use one or chain them inside an orchestrator to build a multi-agent solution.

Agent	What it does
A2A Agent	Supports the integration of agents that are exposed over the Agent2Agent (A2A) protocol — for seamless communication and collaboration.
Analytics Agent	Streamlines data analysis tasks for insightful decision-making.
Author Agent	Enhances writing processes with AI-driven content creation.
Critical Thinker Agent	Analyzes conversations to identify issues and provide insights.
Deep Research Agent	Handles complex user queries through multi-step, structured research to produce comprehensive, citation-supported reports.
Image Generation Agent	Creates high-quality images (both text-to-image and image-to-image).
Image Understanding Agent	Analyzes and interprets visual data for deeper insights.
MCP Agent	Integrates Model Context Protocol (MCP) support for dynamic tool discovery and invocation via MCP servers.
Planning Agent	Designs realistic plans by analyzing user interactions and goals.
Research Agent	Handles complex queries using RAG via web search and vector search methods.
Search Agent	Answers queries by searching the internet, specifically using Google.
Tool Use Agent	Interacts with external tools to perform tasks and deliver results.

Configuration is intentionally minimal. Below is the actual sample from the docs — a project that wires up the SearchAgent to perform web searches and respond to user queries.

YAML · project config# configure your utility agents in this list utility_agents: - agent_class: SearchAgent # The class of the agent agent_name: "Search Agent" # A name that you choose orchestrator: agent_list: # list the configured agents here - agent_name: "Search Agent"

Python · deploy & queryimport asyncio import os from air import DistillerClient from dotenv import load_dotenv load_dotenv() # loads API_KEY from .env api_key = str(os.getenv("API_KEY")) async def search_demo(): distiller_client = DistillerClient(api_key=api_key) distiller_client.create_project( config_path="example.yaml", project="example" ) async with distiller_client( project="example", uuid="test_user" ) as dc: responses = await dc.query( query="Who won the FIFA world cup 2022?" ) async for response in responses: print(response['content']) if __name__ == "__main__": asyncio.run(search_demo())

The example demonstrates a single agent. Configure additional agents under utility_agents and include them in orchestrator.agent_list to develop a multi-agent solution.

04

Three super agents. For when one agent isn't enough.

Super Agents are engineered to handle complex tasks by orchestrating multiple agents — creating dynamic and powerful collaborations. Three of them ship with the SDK.

Super Agent · 1

Base Super Agent

Decomposes a complex task into several subtasks, assigning each to the appropriate agents.

Dynamic decomposition — the agent decides who does what
Best for open-ended, exploratory workflows

Super Agent · 2

Flow Super Agent

Executes a deterministic workflow configured by the user among agents.

You define the steps · the platform runs them
Best when the path is known and reliability matters more than flexibility

Super Agent · 3

Evaluation Super Agent

Systematically assesses the performance of utility agents based on predefined metrics and sample queries — a structured approach to improving agent performance.

Treats agent quality as something measurable
Generates the feedback loop for continuous improvement

05

The Trusted Agent Huddle.

Twelve utility agents and three super agents would already be a strong roster. But the platform doesn't ask you to choose between AI Refinery and the rest of your stack. The Trusted Agent Huddle brings third-party agents into the same orchestration fabric — a roster of 12 partners whose agents you can call alongside the built-ins.

Partner agent	Where it runs
Amazon Bedrock Agent	Hosted on AWS — uses the reasoning of foundation models, APIs, and data to break down user requests, gather information, and complete tasks.
Azure AI Agent	Cloud-hosted on Microsoft Azure — interprets queries, invokes tools, executes tasks, and returns results.
CB Insights Agent	Hosted on the CB Insights market intelligence platform — verified market intelligence, company profiles, deal information, business analytics.
Databricks Agent	Hosted on Databricks — uses Databricks Genie so business teams interact with their data in natural language.
Google Vertex Agent	Hosted on Google Cloud Platform — leverages Google's foundation models, search, and conversational AI to automate tasks and personalize interactions.
Pega Agent	Hosted on Pega Platform — analyzes business workflows in real time, generates context-aware answers using enterprise knowledge to streamline issue resolution.
SAP Agent	Hosted on SAP — automates workflows, analyzes real-time business data, assists in financial operations, delivers contextual responses.
Salesforce Agent	Hosted on Salesforce — routes cases, provides order details, extends databases, responds to queries.
ServiceNow Agent	Hosted on ServiceNow — workflow automation, intelligent support, decision-making enhancement, user experience improvement.
Snowflake Agent	Hosted on Snowflake — business teams interact with their data through natural language and analyze data intuitively.
Wolfram Agent	Hosted on Wolfram Alpha — advanced computations, visualizations, scientific and mathematical queries, knowledge-based data retrieval.
Writer AI Agent	From Writer.com — generates, refines, and structures content using integrated tools and customizable guidelines.

06

The model catalog. Eight types. One choice point.

The model catalog offers a wide range of AI solutions for text and image processing — accessible through the agentic workflow or directly via API calls. Eight model types currently shipped, each with named providers and specific models from the catalog.

Type 1

LLMs & VLMs

For text and image input processing
mistralai · Mistral-7B-Instruct-v0.3 · Mistral-Small-3.1-24B-Instruct-2503
openai · gpt-oss-20b · gpt-oss-120b
Qwen · Qwen3-32B · Qwen3-VL-32B-Instruct
deepseek-ai · Deepseek-r1-distill-qwen-32b

Type 2

Embedding Models

For embedding textual data
intfloat · e5-mistral-7b-instruct
intfloat · multilingual-e5-large
Qwen · Qwen3-Embedding-0.6B

Type 3

Compressors

For prompt compression
microsoft · llmlingua-2-bert-base-multilingual-cased-meetingbank

Type 4

Rerankers

For optimizing search result rankings
Reorders retrieved documents by query relevance

Type 5

Diffusers

For image generation tasks
black-forest-labs · FLUX.1-schnell

Type 6

Segmentation Models

For high-quality image segmentation

Type 7

Text-to-Speech (TTS)

For converting text to speech
Azure · AI-Speech

Type 8

Automatic Speech Recognition (ASR)

For converting speech to text
Azure · AI-Transcription

07

Safety, by default.

AI Refinery prioritizes safety — offering key features to ensure ethical and secure interactions. Two safety features ship today, each crucial for maintaining privacy and promoting responsible AI usage across applications.

Safety · 1

PII Masking

Safeguards personally identifiable information by masking sensitive data — like emails and phone numbers — before they reach backend systems or AI agents.

Configurable — define what counts as PII for your context
Reversible — original values are recoverable when authorized
Toggleable — turn it on or off per workflow
Aligns with global data protection standards

Safety · 2

Responsible AI (RAI)

Applies safety and policy rules to user queries handled by Large Language Models. Ships with default rules. Welcomes custom ones.

Default rules filter illegal, harmful, and discriminatory content
Allows users to create custom rules for specific needs
Ensures ethical AI operations

08

Four advanced features that pay for themselves.

These are the capabilities that move you past prototype-grade. Shared memory. Prompt compression. Reranking. Self-reflection. Each one solves a problem you'd otherwise solve manually — over and over.

Feature · 1

Agents' Shared Memory

Lets multiple AI agents access and utilize common memory resources — enhancing collaboration for more coherent and contextually aware responses.

Chat History Module: stores and retrieves chat conversations efficiently — agents maintain context across interactions
Relevant Chat History Module: fetches and summarizes the most pertinent past conversations, focusing on key insights and themes
Variable Memory Module: manages key-value pairs for storing and updating user-specific data — for personalization and continuity

Feature · 2

Prompt Compression

Reduces the size of input prompts while retaining essential information — enabling faster, more cost-effective processing.

Streamlines content from top-ranked documents
Enhances efficiency in generating comprehensive responses
Translation: smaller bills, same answer quality.

Feature · 3

Reranking

Improves the relevance of retrieved documents by reordering them based on their pertinence to the query.

Prioritizes the most relevant information first
Ensures the agent provides precise, meaningful responses
The difference between "found it" and "found something close"

Feature · 4

Self-Reflection

Enables Utility Agents to iteratively refine responses by evaluating and regenerating them until they meet quality standards.

Ensures responses are correct and relevant
Strategies include selecting the best attempt or aggregating information for the final output
Quality as a process, not a wish

09

Ten APIs. One platform.

The AI Refinery platform offers a comprehensive suite of APIs to enhance AI application development — from generating text responses to utilizing machine learning models. Each API focuses on a specific area to meet diverse project needs.

Fig 2. The 10 API areas. Distiller (highlighted) is the orchestration entry point — every other API is a primitive your agents can call directly. Realtime Distiller and Physical AI are the streaming and embodied-AI extensions.

API	What it gives you
Audio	Tools for audio processing and analysis, including speech recognition.
Chat Completion	Generates responses using LLMs supported by AI Refinery.
Distiller	Enables agentic project creation and access to other AI Refinery features.
Realtime Distiller	Streaming variant of Distiller for realtime agent workflows.
Embeddings	Creates the embedding of textual data using embedding models supported by AI Refinery.
Images	Provides image generation and segmentation capabilities.
Knowledge	Offers knowledge extraction and knowledge graph functionalities.
Models	Access the list of models currently supported by AI Refinery.
Moderations	Evaluates whether the input contains any potentially harmful content.
Physical AI (preview)	Provides advanced tools for video-based understanding, simulation, and synthesis of the physical world.
Training	Enables customization of AI models with personal data through training capabilities.
Observability	Enables querying logs, metrics, and traces for monitoring and debugging AIRefinery applications.

10

The bottom line.

Stop Googling. Start shipping. AI Refinery™ by Accenture isn't asking you to learn a new partner — it's asking you to stop relearning the same patterns every quarter. 12 utility agents ready to deploy. 3 super agents for orchestration. 12 trusted partner integrations via the Trusted Agent Huddle. 8 model types in the catalog. 10 API surfaces. 2 safety features — PII masking and Responsible AI. 4 advanced features — shared memory, prompt compression, reranking, self-reflection. All wired together.

The platform's three design intents from the docs: adopt and customize LLMs to meet specific business needs, integrate generative AI across enterprise functions using a robust AI stack, and foster continuous innovation with minimal human intervention. Each one is a problem most teams solve in private. AI Refinery solves them once, in shared infrastructure, so your team can focus on what's actually different about your use case.

The harness is built. Bring your agents.

Get started.

The full SDK documentation is live — including quickstarts, project guidelines, tutorials for every utility agent, multi-agent workflow patterns, the agent library, the model catalog, and the complete API reference. Generate API keys, install the SDK, and ship your first project today.

Open AI Refinery 101 docs ↗ Generate API keys ↗ Open the SDK on GitHub ↗

#08 · AI Everywhere

Where the practice
puts AI to work.
Seven fronts.

Accenture's Reinvention Services brings the full breadth of the firm to bear on every client problem — organized into seven Reinvention Partner areas that map to how clients actually think about their business. Pick a front. Each one is its own playbook for embedding data and AI at scale, and each one is being assembled now. Cybersecurity. Digital Core. Finance. Industry & Enterprise. Song. Supply Chain & Engineering. Talent.

7 partner areas 1 reinvention thesis Coming Soon

Pick your front

Reinvention Partners · seven areas of the practice

#08 · 8.a · Cybersecurity

Cyber-resilience.
Value through
trust.

The Cybersecurity Reinvention Partner reinvents how enterprises defend, protect, and grow value through trust — building defenses, protecting enterprises, managing risk, and enabling emerging technologies. This chapter is in build. The full playbook will cover the AI & data layer of cyber: agentic SOCs, identity for non-human actors, model-and-data security patterns, and the partner stack underneath.

Coming Soon Reinvention Partner · 8.a

#08 · 8.b · Digital Core

The digital
foundations,
reinvented.

The Digital Core Reinvention Partner reinvents the foundations every enterprise runs on — technology strategy and architecture, data and AI, modernizing and managing applications, infrastructure, data, and cloud. This chapter is in build. The full playbook will cover the architecture patterns, the modernization plays, and the AI-native operating model that ties them together.

Coming Soon Reinvention Partner · 8.b 1 sub-chapter live

Inside Digital Core

Sub-chapters of the Digital Core playbook

#08 · 8.b.i · Digital Core · Enterprise Architecture

The architecture
beneath the
architecture.

Enterprise Architecture is the connective tissue of Digital Core — the patterns, principles, and decisions that determine whether AI lands as a product, a platform, or a pile of pilots. This sub-chapter is in build. It will cover the EA reference patterns we use, the decision frameworks behind them, and the partner ecosystem that supports each layer.

Coming Soon Sub-chapter · 8.b.i

#08 · 8.c · Finance

Financial
performance,
reinvented.

The Finance Reinvention Partner reinvents financial performance by supporting the CFO agenda — driving best-in-class performance and delivering insights and benchmarking across the enterprise. This chapter is in build. The full playbook will cover AI in close-and-consolidate, predictive forecasting, working-capital optimization, and the data foundations underneath.

Coming Soon Reinvention Partner · 8.c

#08 · 8.d · Industry & Enterprise

Core value chains.
End-to-end.

The Industry & Enterprise Reinvention Partner reinvents core industry value chains and drives end-to-end, cross-functional reinvention to deliver growth and long-term value. This chapter is in build. The full playbook will cover the industry-specific AI patterns we deploy, the cross-functional decision frameworks, and where the highest-value reinventions are landing today.

Coming Soon Reinvention Partner · 8.d

#08 · 8.e · Song

How clients
grow.

Song reinvents how clients grow — bringing together customer growth strategy, marketing, sales, service, commerce, design, digital products, data, and AI to create customer-led growth. This chapter is in build. The full playbook will cover agentic CX, generative creative, conversational commerce, and the data foundations that make personalization at scale possible.

Coming Soon Reinvention Partner · 8.e

#08 · 8.f · Supply Chain & Engineering

Across the
product and asset
lifecycle.

The Supply Chain & Engineering Reinvention Partner helps clients leverage AI and digital technologies across product and asset lifecycles to build competitive advantage. This chapter is in build. The full playbook will cover digital twin patterns, agentic supply planning, generative engineering, and the partner stack across PLM, MES, and ERP.

Coming Soon Reinvention Partner · 8.f

#08 · 8.g · Talent

How people
and organizations
work.

The Talent Reinvention Partner reinvents how people and organizations work — delivering leadership, talent, operating models, and change to accelerate the workforce agenda. This chapter is in build. The full playbook will cover human-AI collaboration patterns, agent-as-coworker operating models, the change-management frameworks underneath, and the skills architecture we deploy.

Coming Soon Reinvention Partner · 8.g

The 8 in the Repo

The quick takeaways from this issue

AI benchmark results keep improving — but do they translate to enterprise value?

AI companies are hungry for more training data. Defunct startups are in their sights.

We need to talk to clients about: who actually owns ROI?

What's new, what's not, and why it matters

AI is acing its benchmark exams. Does that translate to business value?

Slack chats, Jira tickets and email archives are commanding attention at startup fire sales

The Prompt

In case you missed it

Adobe levels up its AI efforts in Creative Cloud

Check out our PoVs on MCP and Google's TurboQuant

Five platforms. One workload. The real spend.

Which model for which dollar?

Same model. Five front doors.

The story begins with a misconception.

First, a fair fight.

Two ways to buy the same outcome.

Cloud Native Services

Independent Data Platforms

Now the receipts.

Cloud Native Data Platform Services

Independent Data Platform — deployed on AWS

The plot twist.

So how do you actually decide?

Strategic Positioning

Platform Archetype

Validate Commercials

The three archetypes, at a glance

The bottom line.

Ready to map this against your estate?

The leaderboard tells you only half the story.

First, a fair fight — at scale.

The Frontier Curve.

Today's leaderboard, by capability.

A crowded summit.

A truly jagged frontier.

Now connect the score to the receipt.

When the cheapest option is also a serious option

Gemini 3 Flash

Kimi K2.5

Managed API or self-host? The math has an answer.

Case A — Phi-4 (small model, by Microsoft)

Case B — DeepSeek v3.2 (large model)

From dashboard to deployed agent.

Connect the agent

Brief it like an analyst

Get a defensible cost model

What makes this work

The bottom line.

Ready to map the frontier against your workload?

The story begins with a misconception.

First, name the doors.

Two ways to buy the same model.

Hyperscaler-Operated

Anthropic-Operated

Now the receipts.

Archetype A · Hyperscaler-Operated

Archetype B · Anthropic-Operated

The plot twist.

The compliance king with a search problem.

Built for builders, missing the long run.

The newest partnership — and the boundary problem.

Where the native feature set actually lives.

So how do you actually decide?

Lead with governance posture.

Then layer in feature ambition.

Build hybrid by design — not by accident.

The three patterns, at a glance

A footnote on M365 — because someone on the call will ask.

The bottom line.

Ready to map this against your estate?

Pick your door

Why "logical architecture" is too vague for AI.

The blueprint, applied at Fortune-15 scale.

Same blueprint, six platforms — same nine gaps.

Security isn't a layer. It's a zone.

Frameworks are easy to write. Hard to ship.

The story begins with a misnomer.

The nine viewpoints, named.