Fixing Claude with Claude: Anthropic reports on AI site reliability engineering

Published mon 23 Mar 2026 // 17:05 UTC

QCon London A member of Anthropic's AI reliability engineering team spoke at QCon London on why Claude excels at finding issues but still makes a poor substitute for a site reliability engineer (SRE), constantly mistaking correlation with causation.

Alex Palcuie was formerly an SRE for Google Cloud Platform. "My job is keeping Claude up," Palcuie said, adding: "I've been using LLMs for actual incident response." Since January, he's been reaching for Claude before looking at other monitoring tools.

DEVCLASS AD

His team is busy. "Claude goes down more often than any of us would like. Earlier today, I was involved in an incident, even if I'm at a conference."

Is Palcuie automating himself out of a job? No, he said. "It would be hypocritical to say that Claude fixes everything. My team exists, we're hiring for many positions, this should show you that no, it doesn't work."

However, he said "many of us would not be surprised" if it did work in future, and his talk demonstrated that AI is already helpful.

DEVCLASS AD

Speaking of his career in incident response, Palcuie reflected that having engineers on call is "a tax on humans because our systems are not good enough to look after themselves." Palcuie spoke of the stress of being on call. "Your phone buzzes, there's half a second where you go from asleep, to incident commander mode... then at 9:00 am you show up at work and have to look professional and presentable."

Incident response, he said, can loosely be broken down into a loop of four phases: observe, orient, decide, act.

AI, he said, is fantastic for the observation part. "It reads the logs at the speed of I/O, it doesn't get bored, this at scale is something no human can match."

He recounts a real incident when, on New Year's Eve, Claude Opus 4.5 was returning HTTP 500 errors. "I open Claude Code and ask it to have a look." The AI wrote a SQL query and "within seconds it has the answer, an unhandled exception in the image processing class." It posts the Python stack trace but "it doesn't stop there." Claude identified the failing requests, checked the accounts that sent them, and found 200 accounts "all sending 22 images at the same time." That looked suspicious. Claude looked further and found 4,000 accounts all created at the same time, most sitting dormant. The AI said: "Stop looking at the 500s, this is fraud."

Without AI, "I would have marked this as a bug, I would not have paged account abuse," Palcuie said.

His next anecdote is less positive. AI processing relies on a key-value (KV) cache for performance. "This KV cache can be gigabytes in size, it's really easy to break it, it's finicky, it's fragile." When it breaks, it causes a lot of extra compute and monitoring shows many more requests.

"Every single time, I would ask Claude, what happened here? Claude would say, request volume increase, this is a capacity problem, you need to add more servers."

DEVCLASS AD

The problem, he said, is that Claude "will get wrong correlation versus causation." It's like a new joiner on the team, they will think "oh, it's a capacity problem, when actually you lost your cache."

"This is why we can't trust LLMs for incident response," said Palcuie. The problem is its inability to "step back and start discerning between causation and correlation... For us humans, it is hard as well."

When Claude is asked to produce a postmortem report, it delivers "an 80 percent story that's pretty, it's readable and convincing," said Palcuie, but "it's really bad at root causes." Claude says "this was the thing, and we all know it is not one thing. It's not one root cause... It was never the rollout. It was never the code change. It was all the processes in the company that allowed the incident. And Claude doesn't know the history of your system, especially if your system has been there for ten years."

It is important, said Palcuie, to have SREs that "have been burnt before... they have the scar tissue." He worries that if AI is used more, "will we have our skills atrophy?" – in parallel with the concerns software developers often express regarding having AI write most of the code.

The Jevons Paradox, said Palcuie, is "the favorite paradox in the AI industry. It's when technological improvements increase the efficiency of our resources used, but the resulting lower cost causes consumption to rise rather than fall."

In the case of software, "it's easier to write software, so we write much more of it, so the complexity goes up and not down, which means things break in more interesting ways, which means more incidents, more on call... all the improvements in the tooling will be cancelled by this ever-growing complexity."

Maybe, said Palcuie, AI agents can simplify and manage the complexity, maybe "do what we've collectively learned in our industry, but that's a big if."

He ended on a positive note, saying: "The models are the worst today that they'll ever be."

The overall story, though, is not to leave SRE to AI and keep training reliability engineers because they will be needed in future.

Fixing Claude with Claude: Anthropic reports on AI site reliability engineering

Fixing Claude with Claude: Anthropic reports on AI site reliability engineering

AI for software developers is in a 'dangerous state'

Why real-world AI performance depends on the control layer

Oracle introduces Project Detroit for fast Java interop with JavaScript and Python

Vite team boasts 10-30x faster builds with Rust-powered Rolldown

Users protest as Google Antigravity price floats upward

Microsoft ships VS Code weekly, adds Autopilot mode so AI can wreak havoc without bothering you

JetBrains launches AI agent IDE built on the corpse of abandoned Fleet

Microsoft Azure CTO set Claude on his 1986 Apple II code, says it found vulns

npmx package browser released as alpha to fix pain of using npmjs

Generic methods arrive in Golang, but they weren't the top dev demand

Top Microsoft execs fret about impact of AI on software engineering profession

GitHub Dependabot is a 'noise machine', and should be turned off, says Go library maintainer

From Agile to AI: Anniversary workshop says test-driven development ideal for AI coding

Godot maintainers struggle with 'draining and demoralizing' AI slop submissions

React survey shows TanStack gains, doubts over server components

GitHub previews Agentic Workflows as part of continuous AI concept

Anthropic updates to hide Claude’s AI actions, devs hate it

Microsoft's sudden deprecation of Polyglot Notebooks leaves users fuming

Microsoft delivers first preview of .NET 11 and C# 15

JavaScript survey reveals gripes against date handling, Webpack and Next.js - and that "TypeScript has won"

Heroku future in doubt as Salesforce freezes features to focus on AI