AI 為我節省了時間，但付出了什麼代價？

Hacker News

大約 1 個月前

AI 生成摘要

作者探討了使用AI程式碼工具的取捨，承認節省了時間，但質疑在從編寫程式碼轉向審查的過程中可能失去的價值。

AI saved me time, but at what cost? - by Ali Haydar

Tech Room 101

AI saved me time, but at what cost?

My experience moving from days of coding to hours of review, and what I probably missed

AI coding tools have moved from autocomplete to autonomous agents that navigate codebases and make decisions. My social media feeds are full of developers shipping features in hours using multi-agent workflows, claiming massive productivity gains.

I'm both excited about the speed and concerned about what we might be losing.

Thanks for reading Tech Room 101! Subscribe for free to receive new posts and support my work.

My experience looks different from the viral demos, which usually highlight the exciting parts. So I decided to test it myself. When someone says they built an app in 3 hours, what are they actually measuring? The typing? The initial scaffold? A production-ready system with real customer value and few bugs?

I wanted to find out.

So I decided to test it

I wanted to test this myself. I took a reasonably complex feature and gave it to an AI agent with very clear requirements. I spent time making the requirements precise, reviewing them, making sure they were unambiguous (I used spec-kit to do that; it’s a very helpful tool to iterate over the idea and form a detailed plan with set of user stories/tasks, but it did devour my tokens). I also specified the coding style and provided examples of other repositories to follow.

Then I watched the agent work. It created files, wrote tests, generated CI/CD YAML files, wrote Terraform configurations. It looked productive. It looked like progress.

By the time I hit my token quota, I had... a lot of code. But here’s the thing: it wasn’t working code. The deployment didn’t work. The tests existed but had issues. And when I started looking through what it generated, I found multiple decisions that made no sense. For example:

Why is there no SSL certificate configured here?

Why is this implementation so defensive about error handling in places that don’t matter?

Why did it create versioning for an S3 bucket that’s just storing temporary data?

Why did it add null checks for parameters that can never be null?

Why did it implement a retry mechanism with exponential backoff for polling?

Why did it implement polling at the first place, when the requirements specified how the user would access that data?

These weren’t things I asked for. In fact, I had specifically outlined in my requirements what not to do. I had given examples of the style I wanted. But the AI generated what looked like “best practice” code based on its training data, regardless of whether those practices actually made sense for my specific context.

Then I experienced a bit of cascading issues where fixing an issue created another issue, and so on. So I spent the next few hours going through everything. Fixing configurations. Removing unnecessary complexity. Getting things to actually deploy and run. Understanding a system that something else had designed.

I haven’t seen this part making it much to the viral posts.

Here’s what I learned from this experience: the AI did save time. Absolutely. Looking back a few years ago, implementing this same feature would have taken me a few days. Now it took a few hours of review and fixes to get it working. That’s a real improvement.

But here’s what I’m thinking about: it’s likely I didn’t catch all the issues the AI created.

Think about it. In real life, when we review another person’s code, do we catch everything? No. Even with thorough code review and testing, issues slip through. Edge cases we didn’t consider. Subtle bugs that only appear under specific conditions. Technical debt that seems fine now but causes problems later.

So if I’m reviewing AI generated code in a few hours (code that would have taken me days to write), what am I missing?

I caught the obvious stuff. But what about the subtle things? The decisions that seem fine now but will cause problems later?

This is the new reality, I think. We’re trading speed for a different kind of uncertainty. Not necessarily worse, just different.

After spending hours fixing the AI’s output, I had code that worked. But here’s the question: do I know this codebase?

I read an article a few weeks ago about “epistemic debt”, not technical debt from messy code or a scrappy implementation we traded off, but the gap between what AI generates and what we actually understand, as an organisation (or team). The idea stuck with me: you can have clean, working code and still not comprehend the decisions embedded in it.

I didn’t write the code. I reviewed it, modified it, and debugged it. But there are still parts where I’m not entirely sure what the code is doing, just that the scaffolded app is working. When something breaks in production, will I be able to debug it quickly? Or will I spend extra time trying to understand decisions I didn’t make and might not fully agree with?

This is different from using a library where I don’t know the internals. With a library, I understand the interface and the contract. With AI generated code, I’m looking at implementation details that are supposedly “my” code, but feel like someone else wrote them, which they technically did.

The multi-agent question

If one agent generated code that required hours of cleanup, what happens when four agents are working simultaneously? How do you catch the conflicts, the over engineering, the subtle misalignments? And more importantly: what are you missing? If I'm uncertain about what I missed reviewing one agent's output, how do you possibly catch everything from four agents working at once?

The most extreme version of this I've seen is Steve Yegge’s “Gas Town” (here’s the article) about his multiagent orchestration system running many Claude Code instances simultaneously. What he describes is remarkable. And confusing. And maybe a little alarming.

Yegge calls it “100% vibe coded” and admits he’s “never seen the code, and never care to.” Some work gets lost. Bugs get fixed 2 or 3 times by different agents, and you pick the winner later. Designs go missing and need redoing. You won’t like Gas Town if you ever have to think, even for a moment, about where money comes from.

To be fair, he's not suggesting everyone should work this way. He explicitly warns against using Gas Town unless you're already managing 10+ agents daily.

Based on Yegge’s description, it seems suited for contexts where failures are cheap to recover from, speed matters more than perfection, you can afford to lose some work, money isn’t a constraint, and understanding every detail of the implementation isn’t critical. For example, developer tools, rapid prototyping, internal process automation, exploratory data analysis.

Those seem like valid use cases.

Healthcare systems where bugs can literally kill people? Financial systems managing people’s money? Government systems handling sensitive citizen data? The data breach at Manage My Health last year kept people anxious until now. Would any of these stakeholders be comfortable knowing their critical systems were managed with an approach where “some work gets lost” and bugs get fixed “2 or 3 times”?

I don’t think that’s acceptable for production systems that businesses and people depend on.

To be honest, when I first read the article, I thought it was satire - a dystopian tale, like a software engineering version of 1984 or a Black Mirror episode. But I think he’s serious.

I’m skeptical that “vibe coding”, where you never look at the code, scales beyond that narrow context; maybe that will be in a year when the LLMs are significantly better, and spec Markdown files are the next level of abstraction we use instead of code.

Maybe I'm wrong. I haven't built my own Gas Town, so maybe I don't fully understand what Yegge's discovered. Maybe I'm just at Stage 5 (CLI, single agent, YOLO mode) of his evolution chart and still have a way to go.

But the question isn't whether multi agent workflows are powerful; they clearly are. It's about boundaries: what kinds of systems can we responsibly build this way, and what kinds still require the careful oversight we've always needed? I don't have those answers yet, but I think we need to be asking these questions, not just celebrating the speed.

Final Thoughts

AI coding tools genuinely save time. But they introduce a new kind of uncertainty - not from bad code, but from incomplete understanding.

The fundamentals still matter:

Understanding what you’re building and why

Making thoughtful architectural decisions

Reviewing code carefully (even though you won’t catch everything)

Testing thoroughly

Building systems that are resilient to the issues you’ll inevitably miss

I’m keen to hear your experiences. What does your review process look like? How are you handling the uncertainty? Let’s figure this out together.

Thanks for reading Tech Room 101! Subscribe for free to receive new posts and support my work.

Great insight I completely agree. AI definitely speeds things up, but it doesn’t replace understanding the why behind what we’re building

No posts

AI Saved Me Time, But at What Cost?