When AI Needs Course Correction: Recognising Flailing vs. Working

I searched my name on Google.

My site came up. First result. That felt good.

Then I saw the icon. A generic grey globe. That felt less good.

My wife - who serves as aesthetic director for this project - took one look and said it could be better. She's working on proper branded assets now. For now, I've got temporary "LR" initials in the site colours. Functional, not final.

But the globe icon was just the visible reminder of something I'd completely missed: production isn't build-once-and-forget.

The Dependabot Wake-Up Call

Dependabot was already on my list. We use it at work - automated dependency updates, security alerts, pull requests for version bumps. I knew I wanted it for the blog.

I set it up. Weekly checks. Maximum five open PRs at once. Group related packages together.

Then Dependabot got to work.

Five pull requests appeared. All major version upgrades. All with breaking changes.

The site was less than five days old.

I wasn't expecting this at all. The blog was built with what Claude Code understood as the "latest" tech available when I started. But AI training data lags reality. Claude built on the latest versions it knew about - React 18, Next.js 14, Vitest 1. Meanwhile, React 19, Next.js 16, and Vitest 4 had already shipped.

This is nothing to do with the pace of JavaScript ecosystem changes, it's about what the models are trained on. I'm working on an experimental R&D project; I asked Claude to build on the latest tech, and it built on the latest it understood.

Five major upgrades:

React 18 → 19
Next.js 14 → 16
ESLint 8 → 9
Vitest 1 → 4
Tailwind 3 → 4

I used Claude Code to investigate the PRs - reviewing the actual code changes, not my normal partner for exploring. Then switched to Claude Chat to work out a plan for handling them. Dedicated branch per upgrade. Clear separation. Work through them systematically.

Back to Claude Code for execution.

Making Space to Learn

My team's been experimenting with Dependabot for our work projects. It's noisy. Pull requests pile up. The signal-to-noise ratio feels off when you're managing 25 integrations across an ERP implementation. Dependabot at that scale isn't just five PRs - it's potentially dozens across the portfolio. The noise-to-signal ratio becomes a real organisational problem, not just an annoyance. When you're leading a team through a major systems transformation, you need practices that scale, not just good intentions.

Before asking the team to figure out how to handle this, I needed to understand it myself.

The blog became a learning ground. No business delivery pressure. No stakeholders waiting. No project timelines. Just me, five dependency upgrades, and the space to learn what "noisy" actually means in practice.

I was improving my skills in an environment where I could afford to make mistakes. I hoped this would help the team eventually, but I didn't know it would. I just knew that if I was going to help build practices around this, I had to understand the problem firsthand.

The Smooth Ones

React 19 and Next.js 16, ESLint 9, Tailwind v4 - these went smoothly. Claude Code had the knowledge, the documentation existed, the patterns were clear.

This is what AI working looks like: clear problems with clear fixes, and incremental progress through related changes. TypeScript caught breaking changes immediately. Automated migration tools handled most updates. A couple of iterations to get things right. Steady forward movement.

Then I hit Vitest.

When AI Starts Guessing

Vitest 4 came out only 10 weeks earlier. Major version bump. 98 test failures.

Claude Code said: "There are 98 tests to update. This will be tedious. Let me try a Python script."

The script failed.

"Ah, let me try something in PowerShell."

That failed too.

"Let me try another route."

It was clearly struggling and guessing what to do next.

This wasn't normal iteration. This was flailing.

I'd seen this pattern before - jumping between completely different approaches, Python scripts, PowerShell commands, different frameworks, different languages. When AI starts pivoting radically between approaches, something's wrong with the context.

Something felt off, so I opened a web browser - old school, no AI - and searched for Vitest 4's release date; it was 10 weeks old. That was the clue, too recent for the model's training data, probably missing the migration guides.

This wasn't AI solving the problem, this was me stepping outside AI's limited context to identify what was actually wrong.

I found the Vitest repository on GitHub and gave Claude Code the direct link.

Code processed the repo. Then it flew through the upgrade.

What changed? I can't say exactly, but I suspect it found the migration guides. Real examples of breaking changes and how to handle them. Concrete patterns instead of speculation.

The lesson: when AI starts jumping between radically different approaches, it needs better context. Give it authoritative sources rather than letting it guess.

From Personal Pattern to Team Practice

The Vitest experience stuck with me. That pattern - AI jumping between radically different approaches - wasn't just useful for my blog upgrades. It was something my team needed to recognise.

Our VP of Digital Technology introduced Claude Code to the organisation. I'd been building on that foundation with the European team - two developers, an analyst, and me. All learning together. The analyst and I aren't developers, we're semi-technical people trying to get production-quality output from AI tools.

After the Vitest upgrade, I started noticing the "flailing vs working" pattern in our work. I pair-worked with the analyst who'd written one of our existing apps. We were bringing it up to standards - his app, his domain knowledge, but my experience with prompting and testing. We worked in tandem. He'd test manually, make tweaks, run the new tests I'd added. I'd watch for the flailing pattern and intervene when Claude started guessing.

It became clear we needed something systematic.

I used Claude Code to analyse our recent repositories and draft standards documentation. Just had it look at what we'd already built, and extract the patterns. Then our senior dev and I reviewed and refined it heavily. It's my idea, but I'm not the technical authority here. The senior dev's expertise shapes what "good" actually means.

Then we put it in a repo. Now every team member using Claude just clones it locally and references it. All new projects use these standards. We've gone back and reworked old projects to meet them. The measurable outcome: pull requests from our experienced developers now need minor tweaks instead of major rework.

I've even recommended it to some of our American teammates. They're trying it out.

The standards repo solves the problem I had with Vitest at scale, instead of AI guessing what "good" looks like for our projects, it has concrete examples. Instead of me teaching the pattern to each person individually, the system teaches it.

The practices are starting to take hold beyond just following the standards. Last week one of our developers noticed a gap in the repo and suggested improvements without being prompted or asked. That's the signal that this is becoming team capability, not just an initiative.

One dependency upgrade taught me a pattern. Now my team is working on a system that's evolving through shared ownership.

Trust But Verify

There was another moment where things looked fine but weren't.

After getting all the upgrades working, Claude Code said everything was ready to push to the repo. Tests passing, builds successful, all good.

I stopped to spin up a local instance anyway. Trust but verify.

The local version was super slow to load. A message in the bottom left of the browser said "compiling."

This was interesting. Claude Code had told me everything was fine. The tests had passed. But something was clearly wrong.

I gave this information to Code, and it identified the problem immediately: parent directory config chaos.

I had config files in both C:\Projects and C:\Projects\lewis-rogal-site. Tailwind was trying to resolve in the parent directory where no node_modules existed. Hundreds of resolution errors were flooding the terminal, the dev server was technically working but compilation was painfully slow.

Claude Code moved the duplicate config files to .bak. Problem solved immediately.

But here's the thing, I only caught this because I tested manually. Claude Code had been confident everything was fine. The automated tests saw nothing wrong. The builds passed.

It was only by actually using the site locally that I discovered the performance issue.

This reinforced something I'd learned from earlier mistakes with cardboard muffins and reward hacking: AI optimises for looking complete. Tests passing and builds succeeding looks complete, but it doesn't always mean it works the way you need it to.

I was in a hurry at this point. What I thought would be a quick one-hour job had hit 90 minutes, and I needed to wrap things up. But I'm glad I took the time to verify rather than just trusting Code's assessment.

Building Verification Into Team Culture

The config chaos I caught by manually testing reinforced something I'd been thinking about for the team.

"Tests pass" isn't enough. Especially when you have semi-technical team members - people who can prompt effectively and understand logic - but aren't career developers. The analyst on my team builds impressive prototypes using Claude Code, high-end user interfaces that get genuine reactions when demoed, but our developers catch things in PR reviews. The gap between "impressive demo" and "production-ready code" isn't always obvious to AI, or to the person prompting it.

I'm not the one catching these issues, our devs are. But I'm building the practices that help us all catch what matters.

We've got an informal agreement now: if you're touching code, address the related Dependabot PRs. It's not formalised. It might have come from a team discussion, might have been my idea - honestly, I'm not sure. But it exists, and it's working.

We've talked about filtering Dependabot PRs to prioritise vulnerabilities since that's our primary concern. Haven't implemented anything yet. But the conversation's happening. The team's thinking systematically about how to handle the noise.

The verification habit - spin up a local instance, test manually, don't just trust the green checkmarks - that's becoming team culture. Not because I mandate it, but because we're all learning when AI optimises for looking complete versus actually working.

The Pattern: Flailing vs. Working

After the Vitest experience and the config chaos, a pattern emerged that's become core to how we teach AI usage on the team:

AI working normally looks like:

Clear problems with clear fixes
Incremental progress through related changes
A couple of iterations to get it right
Steady forward movement

AI flailing looks like:

Jumping between completely different approaches (Python → PowerShell → other language)
Multiple iterations getting stuck on the same problem
Solutions getting overly complex compared to the plan you worked through in exploratory mode
A sense that it's guessing at the answer rather than executing on known patterns

When you see radical pivots between approaches, that's the signal. It's not iterating towards a solution. It's searching for one. Pause, give it better context, point it at authoritative sources.

What This Means for Teams

Five dependency upgrades on a personal blog taught me more about leading AI adoption than any amount of theoretical planning would have.

The learning happened because I had space to experiment without business pressure. No delivery deadlines, and no stakeholders asking for status updates. Just the freedom to watch AI flail with Vitest and figure out why. To catch config chaos by manually testing when I was already running late. To recognise patterns without having to justify the time spent.

That personal learning became systematic practice. The standards repo now gives our entire team - developers and semi-technical people alike - concrete examples of what "good" looks like. The analyst who builds impressive prototypes has clearer guidance on production standards. The developers spend less time on major PR rework and more time on meaningful review.

Some of this material helps with coding. Some helps with how I prompt Claude for drafting posts or communications. Anyone on the team who's interested can learn from it.

The "flailing vs working" pattern isn't just about dependency upgrades. It shows up when we're building integrations for the ERP implementation. When someone's been working with Claude for 30 minutes and making no progress, I can ask: "Is it iterating normally or jumping between radically different approaches?" Usually they recognise it immediately. Then we pause, find better context, and point the AI at authoritative sources.

This is what building team capability looks like. Not mandating practices from the top down, and not being the technical authority who reviews every line of code. But creating systems that help people recognise patterns, making it easier to get quality output, while establishing habits that catch what matters.

When leading AI adoption, understanding the problem firsthand matters. Not to be the hero who figured it out, but to understand the problem your team will face. The Dependabot noise, the training data lag, the gap between "tests pass" and "actually works." The moment when AI stops iterating and starts guessing.

You can't build practices to mitigate problems you don't understand firsthand.

The blog gave me that understanding. The team gets the practices. The work gets better. Next step: formalising the Dependabot filtering we've been discussing. The conversation's happening, and the practices are emerging. That's how you build capability - not through mandates, but through shared learning.

What I've Learned

Automation surfaces work you'd forget. Dependabot brought five major upgrades into view rather than letting them accumulate silently. I wasn't expecting any of this - but it needed doing.

Some upgrades are smooth, some require intervention. React and Next.js went cleanly. Vitest needed the repository link. Tailwind mostly automated but needed manual fixes. You can't predict which will be which until you try.

AI training data lags reality. "Build on latest tech" gets you the latest the model knows about. For cutting-edge projects, expect to upgrade immediately after building. This isn't a bug, it's just how training data works.

Trust but verify always applies. Tests passing doesn't mean it works the way you need. Builds succeeding doesn't mean performance is good. Spin up a local instance. Click through manually. Use the thing you built.

Authoritative sources matter more than ever. When AI starts flailing, don't let it keep guessing. Find the official repository. Find the migration guides. Find the concrete examples. Point the model at something credible rather than letting it speculate based on incomplete training data.

Personal learning can become team capability. The blog upgrades taught me patterns. The standards repo teaches the team. Making space to learn without business pressure gives you what you need to build practices. Then you build systems that help everyone recognise it.

Technical Details:

Upgrades: React 18→19, Next.js 14→16, ESLint 8→9, Vitest 1→4, Tailwind 3→4
Total Time: ~2 hours including running in the shower mid-session as I was late!
Tests: 586 → 615 (all passing)
Vulnerabilities: 6 moderate → 0
Pattern Recognition: Radical approach pivots = needs better context
Team Impact: Standards repo in production, PRs need minor tweaks vs major rework

What's Next: The blog now has RSS feeds, social sharing optimisation, and a completely modern dependency stack. My wife's working on proper branded assets to replace the temporary "LR" initials. The site's functional. It'll iterate towards aesthetic.

That's the pattern: ship functional, iterate aesthetic. It's the same principle that got the site deployed in 16 hours. The same principle keeping it running now. It's the principle building team capability systematically.

This post is part of a series on building with AI: