This Week in AI + Cloud: Your Experience Is the Advantage, Not the Liability

AI + Cloud — Week of February 28, 2026

Feb 28, 2026

The Bottom Line (No Jargon Edition)

If you only read one section, read this. Here’s what happened this week in plain English:

Your experience is your superpower, not your weakness. AI tools can write code and generate content fast, but they don’t know what “good” looks like in your field. If you’ve been doing your job for 10, 20, or 30 years, you have exactly the judgment AI lacks. Don’t be afraid of it — learn to use it. You’ll be more valuable, not less.
Three Chinese companies got caught copying one of the biggest AI models. They ran millions of fake conversations with Anthropic’s Claude to steal its intelligence. Nobody broke in — they just used the product at massive scale through fake accounts. It’s like photocopying an entire library one page at a time. The takeaway: protecting AI isn’t just about locks and firewalls anymore.
A popular AI scorecard turned out to be broken. The test that companies used to prove their AI could write code? Over half the test cases were flawed, and the AI models had basically memorized the answers. So those impressive scores you keep seeing? Take them with a grain of salt. Ask: was this a private test, or could the AI have studied the answer key?
The big three clouds (AWS, Azure, Google) all made it easier to build with AI this week. AWS lets AI agents take real actions now (not just chat). Azure added Anthropic’s best model to its data platform. Google kept simplifying its tools so you spend less time configuring and more time building.
OpenAI is no longer exclusive to Microsoft. Their agent-building platform is coming to Amazon’s cloud too. That means you’ll have more choices about where to run AI tools — and that’s good for everyone.

The thread connecting all of it: In a world flooded with AI-generated everything, the real value is in knowing what’s actually good, what’s actually true, and what actually works. That’s human judgment. And it’s not going anywhere.

The Take That Started the Week

This week I wrote something that hit a nerve: the comfort zone has a cost. You just don’t see the invoice until it’s too late.

I talk to engineers every week who are genuinely afraid AI is going to replace them. And I get it — the headlines are engineered to scare you. But after watching 30 years of tech shifts play out, I can tell you: fear is the wrong operating system for what’s actually happening.

Virtualization was supposed to replace sysadmins. Cloud was going to eliminate infrastructure teams. DevOps would make ops engineers obsolete. I was there for all three. None of it played out the way the fearmongers predicted. What actually happened: the roles changed, the people who adapted early thrived, and the ones who froze got left behind. AI is the same pattern. Faster timeline, same playbook.

Here’s the part I really want to land with this audience:

If you have 10, 20, 30 years in your field, you don’t have a disadvantage. You have a massive one that most people haven’t recognized yet. AI generates, but it doesn’t judge. It produces output, but it doesn’t know what good looks like in your domain. You do. That judgment — knowing which output is right, which approach fits, which edge cases will bite you in production — that’s the part AI can’t replicate. And it’s the part that only comes from years of doing the work.

A junior engineer with AI is fast but unfiltered. A senior engineer with AI is a force multiplier. Your experience isn’t the thing being replaced — it’s the thing that makes AI actually useful. You just have to harness it.

I laid out three paths I’m watching emerge: the Orchestrator (manage agents, define outcomes), the Systems Builder (build the infrastructure agents run on), and the Domain Translator (combine deep industry expertise with AI tools to build things nobody else can). None of them require you to be an AI researcher. All of them require you to start.

Also This Week: Two Stories That Should Change How You Evaluate AI

Anthropic caught three Chinese AI labs distilling Claude at industrial scale.

DeepSeek ran 150,000+ exchanges targeting reasoning. Moonshot AI hit 3.4 million+ targeting tool use, coding, and computer vision. MiniMax — the largest — ran 13 million+ exchanges focused on agentic coding. Total: 16 million+ exchanges across 24,000 fraudulent accounts running through commercial proxy services.

Nobody hacked anything. They used the API exactly as designed, at massive scale, through fake identities. The attack surface was the product itself.

Anthropic’s response included behavioral fingerprinting classifiers, strengthened verification, and countermeasures at the product, API, and model levels. But the bigger takeaway isn’t about Anthropic’s defenses. It’s that the AI moat isn’t the model — it’s the control system around it. Export controls on chips don’t work when knowledge flows out through the API. This pattern will play out across every major lab.

OpenAI published why they stopped using SWE-bench Verified.

They audited 27.6% of the dataset. Of those, 59.4% had flawed test cases that rejected correct code. 35.5% enforced implementation details never mentioned in the task. Worse: every frontier model they tested could reproduce the original human-written solutions verbatim. The models had memorized the answers. Scores climbed from 74.9% to 80.9% in six months. The capability didn’t improve — the benchmark got gamed.

Classic Goodhart’s Law applied to AI. When a measure becomes a target, it stops being useful.

OpenAI now recommends SWE-bench Pro and built their own private benchmark called GDPVal. The shift to private evaluation is the real signal. If someone shows you a benchmark score from a public dataset, the first question should be: is the test private? If not, you might be comparing memorization.

Cloud Roundup: Late February 2026

AWS had a quieter week by recent standards, but one update matters.

Amazon Bedrock now supports server-side tool execution via AgentCore — secure actions like web search and database updates, executed server-side within the Bedrock environment. If you’re building AI agents on AWS, this is the piece that lets agents actually do things without you managing the tool execution infrastructure yourself. Also: EKS Node Monitoring Agent went open source (community contributions welcome), and Deadline Cloud added task chunking for better rendering throughput.

Azure landed a notable model addition.

Claude Opus 4.6 is now on Azure Databricks (as of Feb 26). Serverless Workspaces for Databricks hit GA. WAF Default Ruleset 2.2 is now the standard for Application Gateway — update your configs. Also flagged: the DHE cipher suite retirement hits Azure Front Door and CDN on April 1. Start planning now if you’re affected.

The Databricks play is interesting — Azure is positioning itself as the neutral platform where you can access any model through the analytics stack, not just through the core AI services.

GCP focused on operational improvements.

AlloyDB now integrates with Database Center for prioritized health monitoring — one-click navigation to recommended fixes. Composer deployments generate Airflow v3-compatible DAGs, which means the Airflow v2 end-of-life migration just got a clear path. API Hub got a specification boost preview that improves documentation quality automatically.

Google’s pattern continues: reduce friction, improve defaults, make the platform disappear so teams focus on building.

AI Model Roundup: Late February 2026

OpenAI made a strategic move that’s bigger than any model release: Frontier is coming to AWS.

OpenAI’s no-code agent platform — build, deploy, and manage AI agents — will be hosted on Amazon’s infrastructure alongside Azure. This is the first real crack in the Microsoft-OpenAI exclusivity narrative. Microsoft still retains exclusive IP rights, but the compute layer is diversifying. For practitioners, this means your cloud choice may stop being an AI provider choice. That’s a good thing.

Also: $285B valuation after a $1B Thrive Capital investment, and multi-year alliances with BCG, McKinsey, Accenture, and Capgemini for enterprise adoption. OpenAI is building the consulting channel. The enterprise sales motion is accelerating.

Anthropic released RSP 3.0 on February 24 — updated safety protocols addressing misalignment risks. Government deployments were confirmed in classified environments, with restrictions on firms linked to foreign adversaries. And of course, the distillation attacks disclosure dominated the conversation (covered above).

The pattern I’m seeing from Anthropic this month: security and trust as competitive differentiators. While other labs race on capabilities, Anthropic is racing on the control layer. That’s consistent with their positioning from day one — and the distillation disclosure is evidence that the threats they’ve been planning for are now real.

Google AI shipped Gemini 3.1 Flash image generation with real-time web knowledge and consistent character appearance. Android task automation expanded to multi-step actions through Uber, Lyft, and DoorDash. Lyria 3 now generates 30-second music tracks from text prompts.

The consumer play is aggressive. Google is embedding Gemini into every surface — phone, browser, workspace, photos. The practitioner signal: if you’re building on Google’s stack, the AI primitives are showing up everywhere. The question isn’t “should we use AI” — it’s “which layer do we integrate with.”

The Pattern I’m Watching

Three themes collided this week, and they’re all connected.

Theme 1: The benchmark trust crisis. SWE-bench Verified just became the poster child for Goodhart’s Law in AI. Public benchmarks are getting gamed. Labs are shifting to private evaluation. The implication: we’re entering a period where you can’t compare AI tools by published scores alone. Hands-on evaluation is the only evaluation that counts.

Theme 2: The model security arms race. Anthropic’s distillation disclosure proves that AI capabilities are now a target — not just the infrastructure that runs them. The moat isn’t the model. It’s the detection, verification, and control systems around it. Every lab will face this. The ones who invested in security early will have the advantage.

Theme 3: Experience as competitive advantage. In a world where AI handles the generation and junior-level execution, the premium shifts to judgment. Knowing what good looks like. Knowing which edge cases matter. Knowing when the AI’s output looks right but isn’t. That’s 10, 20, 30 years of pattern recognition — and it’s exactly what AI can’t replicate.

These three themes are the same theme. In a world flooded with generated output — code, benchmarks, model capabilities, content — the value moves to judgment, verification, and trust. The people and organizations that can separate signal from noise will win.

That’s been true for every tech cycle I’ve watched. It’s just never been this obvious.

What’s your take — are you seeing the same pattern in your world?

Hit reply and tell me. I read every response.

— Darin

Weekly AI and cloud breakdowns from someone who’s been in the game since the early days of the internet. No ads. No filler. Just the signal.

Tech with Darin

Discussion about this post

Ready for more?