I've been building and deploying AI systems for over 12 years. So when a new wave of AI-powered developer tools arrived, I wasn't excited — I was sceptical. I'd seen too many "game-changing" tools that added friction rather than removed it. What follows is what actually happened when I started using them seriously.
The tools I tested
Over the past several months I've worked with Claude Code (Anthropic's CLI), GitHub Copilot, Cursor, and various LLM APIs accessed directly. My day-to-day spans Python ML pipelines, data engineering, computer vision prototyping, and infrastructure-as-code. That range turned out to matter a lot in how I evaluate these tools.
What actually changed
The biggest shift wasn't speed — it was where my time goes. Before, a significant chunk of engineering time was spent on mechanical tasks: boilerplate, remembering exact API signatures, writing repetitive test scaffolding. That work hasn't disappeared, but I now spend a fraction of the time on it.
What that freed up was more time for the part of the work that actually requires my expertise: problem framing, architecture decisions, interpreting results, and knowing when a model output is plausible versus subtly wrong. That's a genuinely valuable shift.
Where I trust AI assistance — and where I don't
I trust AI coding tools heavily for:
- First drafts of utility code I'd otherwise look up (data loaders, argument parsers, config schemas)
- Translating between languages — Python to TypeScript, SQL to pandas, and so on
- Writing unit tests for functions I've already written and understand
- Explaining unfamiliar codebases or library APIs quickly
- Refactoring for clarity when I already know what the code should do
I do not trust them for:
- ML model architecture decisions — the suggestions are often reasonable but not optimal for the specific domain constraint I'm working under
- Numerical code where subtle floating-point or off-by-one errors compound into silent failures
- Security-critical code — I always review this independently
- Anything touching nuclear, safety-critical, or compliance-governed systems
Claude Code specifically
I started using Claude Code because I wanted something that could operate on whole codebases, not just the snippet in front of the cursor. For research projects where I have multiple interconnected modules, the ability to ask questions that span files — "where does this data transformation happen?" or "what calls this function and with what assumptions?" — is genuinely useful.
The thing that surprised me most: it has changed how I start new projects. I now spend more time up-front writing a clear problem description and specifying constraints, because that framing produces much better AI-assisted output downstream. In a way, it's made me a more disciplined engineer.
The honest downsides
Confidence without accuracy is the core risk. These tools produce plausible-sounding code even when it's wrong. For junior engineers, that's a real danger — you need to know enough to catch the errors. I've seen cases in my team where someone accepted a generated solution that was logically coherent but violated a domain assumption we had documented nowhere explicit.
Context loss is the other one. Long sessions degrade. If I'm in a complex debugging session and the context window fills with dead-end explorations, I've learned to restart with a fresh, focused prompt rather than continuing to pile on.
My actual workflow now
For a typical ML pipeline development task:
- Problem framing: I write a brief spec myself — input/output shape, constraints, performance targets. No AI here.
- Scaffolding: AI generates the skeleton. I review it structurally before any line-level reading.
- Core logic: I write the parts that require domain knowledge (feature engineering, loss function choices, calibration). AI writes the surrounding infrastructure.
- Tests: I describe what I expect the code to do; AI generates the test cases. I add edge cases it missed.
- Review: I read all generated code before committing. No exceptions.
The bottom line
AI coding tools are genuinely useful — more useful than I expected, less transformative than the hype suggests. They're tools, not colleagues. The value you get from them scales directly with the quality of your own understanding.
For experienced engineers who know their domain, they extend your reach without substituting your judgement. For people still building that domain knowledge, the risk is that they make you feel more capable than you are. Know which situation you're in.
I'll write more specifically about using these tools in industrial AI contexts — where the stakes are higher and the data is messier — in a future post.