The Cloud Plumbing, Security, and Business Model Stories Behind the Agent Stack
Anthropic shipped a bigger brain. AWS shipped identity resilience and runtime controls. Microsoft shipped AI-native security. And the business model debate got loud at exactly the wrong time.
Last week I broke down the agent platform layer — OpenAI Frontier and Atlas, the Codex App Server protocol, Prism, Claude’s propose-verify-approve pattern, ServiceNow distribution, and Chrome going agentic with Gemini 3. (Read that post here if you missed it.)
The throughline: we’re moving from models to agents to agent platforms, and the moat is context, runtime, and distribution.
This week I want to go one layer down.
Because platforms don’t float. They sit on top of models, cloud infrastructure, security tooling, and business models — and all four of those shifted between February 2–11, 2026, in ways that matter for anyone building or buying agents in production.
Here’s what landed, why it matters, and what I’d actually do about it.
Claude Opus 4.6: what 1M tokens actually changes (and what it doesn’t)
Anthropic shipped Claude Opus 4.6, its strongest “agentic” model yet, with a 1M token context window now available in beta. The headline number gets attention, but the real story is what this does — and doesn’t — change for how you design retrieval and reasoning in agent workloads.
Where long context is a genuine unlock
If your workload is “one large, bounded corpus,” you can now try loading the whole thing into context and reasoning over it directly. No chunking. No retrieval pipeline. No re-ranking. Just the model and the material.
This matters for specific use cases:
Codebase reasoning. An entire repo in context means the model can trace dependencies, understand architectural decisions, and generate changes that are consistent with patterns it can actually see — not patterns it inferred from retrieval snippets.
Incident investigation. Feeding a complete timeline — logs, alerts, runbook excerpts, Slack threads, post-mortems — into one context window lets the model correlate signals that would be fragmented across RAG chunks.
Contract and regulatory analysis. Cross-referencing terms, definitions, obligations, and exceptions across a full document set without worrying about whether your retrieval pipeline surfaced the right clause.
Where long context doesn’t replace RAG
If your workload is “infinite enterprise sprawl” — thousands of documents across dozens of systems with different owners, permissions, freshness requirements, and classification levels — a 1M token window doesn’t solve your problem. You still need retrieval. You still need permissions. You still need a semantic layer that knows what’s current and what’s stale.
Context windows don’t solve governance. They don’t solve freshness. They don’t solve multi-tenancy. Don’t let the headline number distract from the architectural work that still needs to happen.
The cross-cloud angle matters more than people think
Opus 4.6 isn’t locked to one vendor. It’s available in Amazon Bedrock and Microsoft Foundry (Azure). If you’re evaluating agent-capable models and you need cross-cloud availability — because your infrastructure spans AWS and Azure, or because procurement won’t sign off on a single-vendor dependency — this simplifies the conversation. You can run the same model wherever your workloads already live.
The practical experiment
Take a real workload — a repo your team works in, a set of docs your team references daily — and run it through Opus 4.6 in “long-context-first” mode. Then compare the quality, latency, and cost against your existing RAG pipeline for the same queries. Let the data tell you where the tradeoff lands for your specific use case, instead of guessing.
AWS: the runtime and identity layers that make agents survivable
While the platform-layer headlines went to Frontier and Atlas, AWS quietly shipped the kind of infrastructure changes that determine whether agents actually work in production. Two runtime updates and two foundational infrastructure releases.
Bedrock server-side tool use
Bedrock added server-side tool use and extended prompt caching in the Responses API. This is the “make it controllable” layer.
Previously, most agent tool-use implementations were client-side orchestration — your code called the model, parsed the tool request, executed it, and sent the result back. That works in a demo. It breaks in production when you need consistent security boundaries, audit trails, and cost controls.
Server-side tool use means the execution happens inside AWS’s security perimeter — IAM policies, VPC boundaries, CloudTrail logging — with the guardrails you’d expect. Extended prompt caching means repeated context (system prompts, shared documents, conversation history) doesn’t get re-processed on every call, which directly impacts cost and latency for multi-turn agent workflows.
If you’re building agents on Bedrock, this is the shift from “model capability” to “operational capability.” It’s what makes tool use shippable.
IAM Identity Center multi-Region replication
This one doesn’t sound exciting. It is quietly one of the most important AWS releases this quarter.
Identity is a Tier-0 dependency. Everything downstream — console access, CLI sessions, service roles, federated access, SSO — depends on IAM Identity Center being available. Until now, it was single-Region. If that Region had an issue, your identity plane was impaired.
AWS now lets you replicate IAM Identity Center from a primary Region to additional Regions, including identities, permission sets, and account assignments.
Why this matters beyond availability: data residency. Some regulatory frameworks require that identity and access data reside in specific geographies. Multi-Region replication gives you the ability to place identity data where your compliance requirements demand — without building a parallel identity system.
If you’ve ever sat in a BCDR review where someone said “nothing works if identity is down,” this is that conversation getting resolved.
What to do now: Start planning your replication topology. Identify your KMS key strategy for the secondary Regions. Define failover access patterns. Test before you need it. This isn’t a “set and forget” feature — it requires deliberate design around Region selection, replication lag tolerance, and operational runbooks.
Security group “Related resources” tab
Small console enhancement, outsized impact for anyone managing a large AWS estate. The EC2/VPC console now shows which resources — ENIs, instances, load balancers, Lambda functions — are associated with a given security group.
Before this, deleting or modifying a security group was a “hope nothing breaks” exercise unless you had custom tooling to map dependencies. Now you can see the blast radius before you make the change.
Integrate this into your change management workflow. Especially before deletions — require a check of the related resources tab as part of your change request documentation. It’s a small step that prevents expensive mistakes.
Microsoft: AI-native security is becoming a first-class agent concern
Microsoft shipped two security-related updates this cycle that are worth reading together.
AI-powered incident prioritization is now in public preview in Defender. It uses machine learning to help SOC analysts cut through alert noise and focus on the incidents most likely to be real and impactful. If your SOC is drowning in false positives — and statistically, it probably is — this is worth evaluating against your current triage metrics: mean time to acknowledge, false positive rate, analyst fatigue.
Expanded Defender coverage for Foundry-hosted agents means Microsoft is extending its security tooling to cover agent workloads specifically. This is Microsoft positioning security as a first-class concern for agent deployments, not something you bolt on after the fact.
The timing is deliberate. As agent platforms ship (Frontier, Foundry, Bedrock), the attack surface expands. Agents that can execute code, query databases, and take actions on behalf of users are a fundamentally different security problem than a chatbot answering questions. Microsoft is building the security layer to match.
Anthropic’s 0-day research: a signal worth taking seriously
Separately from the Opus 4.6 release, Anthropic is explicitly researching the risk of LLM-discovered 0-days — previously unknown vulnerabilities found by advanced models — and publishing findings about it.
This is a model builder acknowledging that “agentic capability” and “security capability” are two sides of the same coin.
Here’s my honest take: I’m not sure most organizations are ready for the speed at which capable agents can become unintentional security researchers. An agent tasked with “find a way to make this API call work” could, in theory, discover and exploit a vulnerability in the process. The security posture around agent workloads needs to assume mistakes will happen and build containment accordingly — not just for malicious actors, but for well-intentioned automation that wanders into dangerous territory.
This is why the governance layer I talked about last week matters so much. Agent identity, scoped permissions, audit trails, and the ability to revoke access without breaking other workflows — these aren’t nice-to-haves. They’re the difference between an agent that helps and an agent that becomes a liability.
Google GEAR: investing in the developer enablement layer
Google Cloud launched GEAR — a structured skills path for building and deploying agents using Google’s Agent Development Kit (ADK). It includes labs, credits, and a progression path, housed inside the Google Developer Program.
While OpenAI and Anthropic are leading on the platform and model layers, Google is investing in developer enablement — making it easier to get started building agents on its stack. Different strategy, complementary signal. The market is moving fast enough that developer adoption velocity matters as much as raw platform capability.
If you have teams evaluating Google’s agent tooling, GEAR is worth pointing them toward as a structured on-ramp.
Business model whiplash: ads vs. subscriptions (and why enterprise should pay attention)
Two announcements landed almost back-to-back, and the contrast is stark.
OpenAI started testing ads in ChatGPT for logged-in adult users on the Free and Go tiers in the US. Plus, Pro, Business, Enterprise, and Education tiers are not affected. The ads are described as “relevant ads within conversations” — the implementation details and data-handling specifics are still emerging.
Anthropic used Super Bowl visibility to position Claude as explicitly ad-free — framing the absence of advertising as a trust and alignment feature, not just a business model choice.
For enterprise buyers, this isn’t about moral judgment. It’s about trust boundaries and the questions your security, compliance, and procurement teams are going to ask:
Data handling. What user data flows into the ad-targeting pipeline? Even if your org is on an enterprise tier, does the existence of an ad-supported tier change how the underlying model is trained or tuned?
Response integrity. How do you prove that responses in the paid tiers are completely uninfluenced by commercial relationships in the ad-supported tiers?
Vendor risk narrative. When your CISO asks “is our AI vendor also an advertising company?” — what’s your answer, and does it change your risk posture?
My read: this is going to matter most in regulated industries — healthcare, financial services, government — where the perception of data mixing or commercial influence can be as damaging as the reality. Expect this to become a procurement checklist item within the next two quarters.
Databricks: follow the money (agents will eat your data platform bill)
Databricks raising ~$5B at a ~$134B valuation, with AI products crossing ~$1.4B in annualized revenue, is a signal worth reading carefully.
The thesis is straightforward: agents don’t just “think.” They query, join, filter, write, summarize, re-query, materialize, and do it again — often in loops. Every agentic workflow that touches structured data is a workload on your data platform. Every multi-step reasoning chain that needs fresh data is a set of warehouse queries. Every agent that “monitors” something is a recurring compute job.
Databricks is calling itself an “AI beneficiary” because it’s sitting on the metered layer where agent work becomes billable compute. That’s not speculation — it’s already showing up in their revenue numbers.
The FinOps implication is real. If your organization is deploying agents that interact with data platforms — Databricks, Snowflake, BigQuery, Redshift — you need usage budgets and alerts in place before the agents are in production. “Helpful automation” has a way of becoming a surprise bill when nobody set a ceiling on how many queries an agent could run per hour.
Set the budgets. Set the alerts. Have the conversation with your data platform team about what “agent-driven usage” looks like in their billing model. Do it this week, not after the first invoice lands.
What I’d do this week — a practical 10-step plan
Pick the items that match where you are:
1. Run a “long-context first” experiment with Opus 4.6. Take a real repo or document set your team actually uses. Load it into a 1M token context. Run the same queries you’d run against your RAG pipeline. Compare quality, latency, and cost. Let the data decide.
2. Evaluate Bedrock server-side tool use for your agent workloads. If you’re building agents on AWS, test server-side tool execution against your current client-side orchestration. Measure the difference in security posture, auditability, and operational complexity.
3. Plan IAM Identity Center multi-Region replication. Identify target Regions. Plan your KMS key strategy. Define failover access patterns. Test before you need it.
4. Use the security group dependency view in your change workflow. Make “check related resources” a standard step before security group modifications or deletions. Small habit, big risk reduction.
5. SOC teams: evaluate Defender’s AI prioritization preview. Run it in parallel with your current triage process. Measure against MTTA, false positive rate, and analyst workload. See if it actually reduces noise or just moves it around.
6. Review your agent security posture against the 0-day research signal. Ask: if an agent tasked with a legitimate workflow accidentally discovered a vulnerability, would your containment model catch it? If the answer is “I don’t know,” that’s the priority.
7. Set agent-aware FinOps budgets. Assume agent-driven warehouse and API usage is coming. Set budgets, alerts, and per-agent usage caps before the workloads are live.
8. Point your teams at Google GEAR if they’re evaluating Google’s agent stack. Structured learning paths beat ad-hoc exploration for teams that need to ramp quickly.
9. Start your “Trust FAQ” document. Ads, data handling, model choices, logging, retention, response integrity. Get ahead of the questions your security and compliance teams are going to ask — because they are going to ask.
10. Re-read last week’s post and map your agent boundary diagram. Data sources → tools → actions → approval points → rollback mechanisms. If you can’t draw it on a whiteboard, you’re not ready to ship it. (Here’s the post.)
Last week, the agent stack became a product. This week, the foundation underneath it got stronger — better models, more resilient identity, controlled runtimes, AI-native security, and a business model debate that enterprise can’t afford to ignore. The organizations that move deliberately on both layers — platform and foundation — are the ones that will actually ship agents that last.
See you next week.
— Darin

