From Chatbot Frustration to Building Production Tools

I had a simple task. Take 176 XML files and do some find-and-replace operations: change "-505" to "-506", update a couple of date fields. Tedious work, but straightforward.

I opened Copilot and gave it clear instructions: "Find and replace these values in all of these files without changing any other values."

Copilot confidently produced the files. I checked the first one. Every single line was prefixed with ns0:. That's not supposed to be there.

I tried again. "All lines of the XML have been prefixed with nS0: this is incorrect."

Same result. ns0: everywhere.

Third attempt: "There are still ns0: prefixes please remove them."

This time it worked.

The frustration wasn't just about the prefixes. It was the inconsistency. Sometimes Copilot would get it right on the first try. Sometimes it would confidently produce broken output three times in a row. I couldn't trust it to do the same thing the same way twice.

That's when I learned a valuable lesson from woodworking: if you want consistency, build a jig.

In woodworking, a jig is a custom tool that does one specific job reliably every single time. You don't trust your hands to cut the same angle perfectly twenty times in a row. You build a jig that ensures every cut is identical.

I needed a jig for XML manipulation. Not a chatbot. Not a code assistant that might or might not add namespace prefixes. A tool.

The Chatbot Years

My frustration with AI didn't start with XML prefixes.

It started years earlier with customer service chatbots. You know the ones. You're trying to solve a problem, the bot offers you four options, none of them relevant, and when you finally get frustrated enough to type "speak to a human," it cheerfully redirects you to a help page you've already read three times.

Then everything became "AI-powered." Your email. Your search. Your spreadsheet software. Every product announcement promised revolutionary AI features. Most of them were just the same product with a chatbot bolted on.

When ChatGPT launched, I tried it. It was genuinely impressive. Powerful. Fast. Good at research and explaining technical concepts.

But I was paranoid.

Anything I put into ChatGPT felt like it was effectively public forever. I had no idea what happened to my data. Could it be used to train future models? Would someone else see my prompts? I didn't know, and I was too time-poor to research it properly.

So I treated it like a public search engine. I'd never put any work-sensitive information into it. I wouldn't even put in something as vanilla as our ERP warehouse structure - what if that configuration could be used in a cyber attack?

There was one exception. During a job hunt in early 2023, I used ChatGPT for tasks that weren't sensitive. Give it a verbose job description, ask it to pull out the key requirements. Draft cover letters. Review applications before submitting them.

This worked because the data wasn't sensitive - job descriptions are public, and cover letters are meant to be shared. But it reinforced my paranoia about anything work-related. If I was comfortable with ChatGPT reading a job description but not our warehouse structure, where exactly was the line?

This caution killed the tool's usefulness for real work. I couldn't give it proper context. I couldn't work on real problems. I picked it up occasionally as a search alternative, but mostly, I just stayed away.

The Copilot Era

In April 2025, we got access to Copilot through work. Microsoft's offering felt safer - it was enterprise software with proper data policies. I could actually use it.

Copilot was (and still is) genuinely good at certain tasks. I'd use it to draft communications to large groups. Give it context about a project update or policy change, and it would produce a reasonable first draft.

But it made a lot of errors. The drafts were never quite right. I'd take what it produced, then spend time reworking it manually rather than having a conversation to iterate toward something final.

I didn't mind this workflow. It was faster than starting from scratch. Draft, edit, send. It worked.

Then I tried using it for the XML manipulation task.

That's when the ns0: namespace prefix saga happened. Three attempts to get it to stop adding namespace prefixes to every line. The inconsistency was maddening.

But here's what I realized: this might just be me using the wrong tool.

Copilot was designed for drafting and general assistance. I was asking it to do precise, repeatable file manipulation. That's not what it was built for.

A colleague had mentioned they'd used Copilot to write Python scripts for big data processing tasks. That's when it clicked - they weren't asking the AI to process the data. They were asking it to write a tool that could process the data reliably.

If you want consistency, build a jig.

The Claude Breakthrough

In August 2025, we got access to Claude through work - another enterprise tool with proper data policies like Copilot. I decided to try the XML task again, but this time I'd compare both tools side by side.

I gave Claude the same problem: manipulate these XML files, do find-and-replace operations, handle it in bulk.

Claude suggested writing a Python script wrapped in HTML. Drag and drop your files into a browser window, click a button, download the results.

This was the first big step forward. I could build a useful tool.

Not because it worked (Copilot eventually worked too). Because I could share it. No setup required. No Python installation. No command line knowledge. Just open a file in a browser.

I built a series of these tools using Claude Chat. Drag-and-drop utilities for different tedious tasks. Each one was a jig - a custom tool that did one specific job reliably.

Then a teammate showed me Claude Code. Working in the terminal, dropping files directly into a project folder, generating code as a semi-technical person - it was a game changer.

The Log Analyzer Turning Point

The real test came during volume testing on a data migration project.

When something went wrong, we'd get hundreds of error logs. Sometimes 700 files. Duplicate logs mixed with unique ones. We had to open them manually, read through each one, figure out what actually failed.

It was painful work.

I built a log analyzer using Claude Code. It took a couple of hours.

The tool reads through all the logs, detects error patterns, and categorizes them. It identifies about 95% of errors automatically and exports a CSV detailing which files had which errors and which ones needed to be escalated to the technical team. What used to take probably 20 hours of manual work now takes minutes.

We put it through the proper PR process and into our GitHub repo. It's now hosted on a local server. The team uses it when we have large migrations. We can iterate on it and improve it as we find new error patterns.

This was the turning point for me.

Not because I built something clever. Because I built something that was only feasible because the tool made it feasible.

Before AI, this project would have sat in the "not quite worth it" category. The benefit was clear - save 20 hours of manual work. But the cost of building it would have been higher. It would have taken days or weeks to build properly. The return on investment didn't justify prioritizing it over other work.

With Claude Code, the economics changed completely. A couple of hours of work to save 20+ hours every time we do volume testing? That's a no-brainer.

The Messy Middle: Learning What "Good" Actually Means

But here's the thing - that first version of the log analyzer worked, but it was awful.

915 lines of HTML. Everything in one file. All the CSS inline. All the JavaScript embedded. All the error detection logic mixed together. No separation of concerns. No modularity. Just one giant monolithic file that did the job.

I submitted it for code review.

The dev team very politely told me it was unmaintainable.

This was my biggest frustration. I'm far more of a product person than a dev. I didn't know the best practices. I didn't know what "good" code looked like beyond "it runs and produces the right output."

And Claude kept telling me the code was in great shape.

This is what Vibe Coding calls "reward hacking." The AI was optimizing for what it thought I wanted - praise and reassurance - rather than what I actually needed, which was code that followed proper standards and could be maintained by a team.

I took the feedback from the PR comments back to Claude Chat. Used it to build better prompts. Learned what the team actually meant by "modular" and "maintainable" and "separation of concerns."

Then I rewrote it. A couple of evening sessions, probably 4 hours total.

26 focused modules. Clean separation between CSS, JavaScript, and HTML. Each error detector independently testable. Proper architecture. It went through the PR process properly this time and made it into production.

But I had to learn the hard way that "it works" and "it's good" are not the same thing.

Two Modes of Working

I'm writing this during the Christmas break, working on a personal blog site. It's a development project - learning Next.js and TypeScript, building features like a sources reference system, figuring out Azure deployment.

Through this work, I've realized I use AI in two distinct modes.

Mode 1: Exploratory Thinking Partner

This is the conversation mode. I come to Claude with a half-formed idea or a problem I need to think through. Instead of asking it to build something immediately, I give it context and let it ask me questions.

For this blog post, I told Claude I wanted to write about my AI journey - from hating chatbots to building production tools. Claude asked me dozens of questions: What made you hate chatbots? When did security concerns kick in? What was the XML task actually trying to accomplish? How did the dev team give you feedback? What changed in how you prompt?

The questions forced me to think clearly about what actually happened and what I learned. By the end of the conversation, the structure for this post was obvious.

It's about clarity before action. Think first, build later.

Mode 2: Execution Mode

This is Claude Code. I know what I want to build. I've thought it through. Now I need to implement it. "Add a search feature to the blog." "Create a sources page that shows my reading library." "Refactor this component to improve maintainability."

The key lesson: give really clear context before prompting. The more time I spend thinking through what I actually need before I start typing, the better the results.

My most-used prompt in execution mode is probably this: "Think very carefully - can this application be refactored to improve modularity, maintainability, simplicity, or scalability? Produce a plan to perform the actions, ensuring there is good test coverage and redundant files are cleaned up."

I use it at the end of each feature and before pushing anything to main. It's become my safety check - a way to catch the architectural issues before they become problems.

The Key Insight

A year ago, I hated chatbots. Six months ago, I was cautiously experimenting with Copilot. Today, I'm building production tools and working on personal projects that I wouldn't have attempted before.

What changed?

Not the technology. Well, not entirely. The tools got better, but that's not the full story.

What changed was my understanding of what these tools actually are.

AI isn't magic. It's not going to read your mind and build exactly what you need without direction. It's not going to know your team's coding standards or architectural preferences. It won't automatically understand the difference between "it works" and "it's maintainable."

AI is a tool that needs direction. And like any tool, different ones are better for different jobs.

I still use Copilot for drafting communications and finding documentation in our Microsoft environment. I use Claude Chat for thinking through problems and building context. I use Claude Code when I know what I want to build and need to execute.

Vibe Coding calls this FAAFO - Fast, Ambitious, Autonomous, Fun, and Optionality. The speed is obvious. But the real value is in what that speed enables:

Ambitious - Projects that weren't worth attempting before become feasible
Autonomous - Work without constant dependencies and coordination costs
Fun - Building things is enjoyable again
Optionality - Try multiple approaches, pick the best one

The log analyzer saves 20 hours every time we do volume testing on migrations. The XML tools handle tedious file manipulation reliably. The personal blog is teaching me technologies I wouldn't have had time to learn properly.

None of this would have happened if I'd stayed in the "AI is just marketing hype" camp. But it also wouldn't have happened if I'd jumped straight to "AI will do everything for me."

The insight is simple: AI makes things feasible that otherwise wouldn't happen. But only if you learn to work with it properly.

Source: Vibe Coding: Building Production-Grade Software With GenAI, Chat, Agents, and Beyond by Gene Kim and Steve Yegge (IT Revolution Press, 2025)

From Chatbot Frustration to Building Production Tools

I had a simple task. Take 176 XML files and do some find-and-replace operations: change "-505" to "-506", update a couple of date fields. Tedious work, but straightforward.

I opened Copilot and gave it clear instructions: "Find and replace these values in all of these files without changing any other values."

Copilot confidently produced the files. I checked the first one. Every single line was prefixed with ns0:. That's not supposed to be there.

I tried again. "All lines of the XML have been prefixed with nS0: this is incorrect."

Same result. ns0: everywhere.

Third attempt: "There are still ns0: prefixes please remove them."

This time it worked.

That's when I learned a valuable lesson from woodworking: if you want consistency, build a jig.

I needed a jig for XML manipulation. Not a chatbot. Not a code assistant that might or might not add namespace prefixes. A tool.

The Chatbot Years

My frustration with AI didn't start with XML prefixes.

When ChatGPT launched, I tried it. It was genuinely impressive. Powerful. Fast. Good at research and explaining technical concepts.

But I was paranoid.

The Copilot Era

In April 2025, we got access to Copilot through work. Microsoft's offering felt safer - it was enterprise software with proper data policies. I could actually use it.

But it made a lot of errors. The drafts were never quite right. I'd take what it produced, then spend time reworking it manually rather than having a conversation to iterate toward something final.

I didn't mind this workflow. It was faster than starting from scratch. Draft, edit, send. It worked.

Then I tried using it for the XML manipulation task.

That's when the ns0: namespace prefix saga happened. Three attempts to get it to stop adding namespace prefixes to every line. The inconsistency was maddening.

But here's what I realized: this might just be me using the wrong tool.

Copilot was designed for drafting and general assistance. I was asking it to do precise, repeatable file manipulation. That's not what it was built for.

If you want consistency, build a jig.

The Claude Breakthrough

I gave Claude the same problem: manipulate these XML files, do find-and-replace operations, handle it in bulk.

Claude suggested writing a Python script wrapped in HTML. Drag and drop your files into a browser window, click a button, download the results.

This was the first big step forward. I could build a useful tool.

Not because it worked (Copilot eventually worked too). Because I could share it. No setup required. No Python installation. No command line knowledge. Just open a file in a browser.

I built a series of these tools using Claude Chat. Drag-and-drop utilities for different tedious tasks. Each one was a jig - a custom tool that did one specific job reliably.

Then a teammate showed me Claude Code. Working in the terminal, dropping files directly into a project folder, generating code as a semi-technical person - it was a game changer.

The Log Analyzer Turning Point

The real test came during volume testing on a data migration project.

It was painful work.

I built a log analyzer using Claude Code. It took a couple of hours.

This was the turning point for me.

Not because I built something clever. Because I built something that was only feasible because the tool made it feasible.

With Claude Code, the economics changed completely. A couple of hours of work to save 20+ hours every time we do volume testing? That's a no-brainer.

The Messy Middle: Learning What "Good" Actually Means

But here's the thing - that first version of the log analyzer worked, but it was awful.

I submitted it for code review.

The dev team very politely told me it was unmaintainable.

And Claude kept telling me the code was in great shape.

I took the feedback from the PR comments back to Claude Chat. Used it to build better prompts. Learned what the team actually meant by "modular" and "maintainable" and "separation of concerns."

Then I rewrote it. A couple of evening sessions, probably 4 hours total.

But I had to learn the hard way that "it works" and "it's good" are not the same thing.

Two Modes of Working

Through this work, I've realized I use AI in two distinct modes.

Mode 1: Exploratory Thinking Partner

The questions forced me to think clearly about what actually happened and what I learned. By the end of the conversation, the structure for this post was obvious.

It's about clarity before action. Think first, build later.

Mode 2: Execution Mode

The key lesson: give really clear context before prompting. The more time I spend thinking through what I actually need before I start typing, the better the results.

I use it at the end of each feature and before pushing anything to main. It's become my safety check - a way to catch the architectural issues before they become problems.

The Key Insight

A year ago, I hated chatbots. Six months ago, I was cautiously experimenting with Copilot. Today, I'm building production tools and working on personal projects that I wouldn't have attempted before.

What changed?

Not the technology. Well, not entirely. The tools got better, but that's not the full story.

What changed was my understanding of what these tools actually are.

AI is a tool that needs direction. And like any tool, different ones are better for different jobs.

Vibe Coding calls this FAAFO - Fast, Ambitious, Autonomous, Fun, and Optionality. The speed is obvious. But the real value is in what that speed enables:

Ambitious - Projects that weren't worth attempting before become feasible
Autonomous - Work without constant dependencies and coordination costs
Fun - Building things is enjoyable again
Optionality - Try multiple approaches, pick the best one

None of this would have happened if I'd stayed in the "AI is just marketing hype" camp. But it also wouldn't have happened if I'd jumped straight to "AI will do everything for me."

The insight is simple: AI makes things feasible that otherwise wouldn't happen. But only if you learn to work with it properly.

Source: Vibe Coding: Building Production-Grade Software With GenAI, Chat, Agents, and Beyond by Gene Kim and Steve Yegge (IT Revolution Press, 2025)

From Chatbot Frustration to Building Production Tools

From Chatbot Frustration to Building Production Tools

The Chatbot Years

The Copilot Era

The Claude Breakthrough

The Log Analyzer Turning Point

The Messy Middle: Learning What "Good" Actually Means

Two Modes of Working

The Key Insight

Sources

Share this article

From Chatbot Frustration to Building Production Tools

From Chatbot Frustration to Building Production Tools

The Chatbot Years

The Copilot Era

The Claude Breakthrough

The Log Analyzer Turning Point

The Messy Middle: Learning What "Good" Actually Means

Two Modes of Working

The Key Insight

Sources

Share this article