Refactoring in Trenches: Real World Lessons from an Agentic AI project

Refactoring in Trenches

I recently wrapped up an engagement with a customer where we partnered with their engineering team to refactor a critical piece of their stack: a complex media processing pipeline. For the purpose of confidentiality, I will not disclose the customer’s name or the specific details of the project. I will refer to the customer as “the customer” and the project as “the project”. The point of this post is to share my experience and the lessons learned from using Agentic AI for a large scale refactoring project.

To tackle this, we brought Agentic AI, specifically Antigravity into our workflow. The experience was an incredible look into the future of software engineering. It proved that while AI is undeniably a force multiplier, the narrative that it will completely replace seasoned engineers is missing the mark.

Here are six key takeaways and observations from our time in the trenches with Agentic AI.

1. Demystifying the Legacy Codebase 🔦

Before you can refactor, you have to understand what already exists. Any engineer who has worked on a legacy system knows the pain of tracing execution paths through years of undocumented logic and patched-together workarounds.

We used the AI agents to map the existing architecture, reverse-engineer the logic, and explain dense, monolithic functions. It acted like an interactive Rosetta Stone, turning legacy code into plain English. This dramatically reduced the cognitive load and accelerated the team’s understanding of the system behavior before a single line of new code was written.

The team ran Gemini CLI against the old codebase and asked Gemini to create Critical User Journeys (CUJs), Production Readiness Documents (PRDs) and the Test Plans for the existing codebase. This helped us understand the codebase and identify the most critical user journeys to refactor.

2. Prototyping at Light Speed ⚡

Once we understood the old codebase, one of the most impressive wins was our ability to test out different architectures rapidly. Using Antigravity, we were able to build out the new pipeline in three different modern programming languages simultaneously.

This allowed us to benchmark performance, compare the implementations side-by-side, and make a data-driven decision on the final tech stack. Doing this manually by having engineers write the code from scratch for three distinct implementations would have taken weeks, if not months. With Agentic AI we had three working prototypes in one day.

3. The AI “Silver Bullet” is a Myth 🛑

As powerful as it was for reading and generating application code, Agentic AI is not a silver bullet. The biggest friction point we observed was at the intersection of code and infrastructure.

Agents are fantastic at staying within the boundaries of a codebase, but once that code needs to interact with the real-world complexities of cloud environments, networks, and IAM the AI struggles. You absolutely still need experienced engineers to bridge that gap.

4. Permissions and Platform Shifts Require Human Eyes 👁️‍🗨️

A perfect example of this friction happened during deployment. Midway through, we decided to shift the deployment from one container platform to another.

Immediately, we hit a wall of convoluted permission issues and access blocks. The agent couldn’t untangle the mess. Fortunately, the team consisted of seasoned engineers who had the operational scars to quickly spot the root causes and clear them. Without their expertise, we would have been stuck in an endless loop of trial and error.

One personal observation from my side is how many times I find myself fixing errors manually instead of prompting the AI to fix them. This is not because the AI cannot fix them, but because it is often faster to fix them myself than to explain the context to the AI. This is especially true for infrastructure related issues.

5. Technical Debt is Still a Human Decision 🧠

If you ask an AI agent a specific question about an implementation, it will give you a straightforward, localized answer. What it won’t do is look at the big picture.

Decisions about technical debt, architectural trade-offs, and hosting platforms were driven entirely by human discussions. AI lacks the business context and foresight to weigh the long-term cost of a technical shortcut against immediate project deadlines. The macro-level strategy remains firmly in the hands of the engineering team.

A very good example I suspect a lot of people will run into is choosing an always-on Architecture (Traditional servers running all the time with endpoints) vs an event-driven architecture. While sometimes the choice might seem obvious, factors like cost, day 2 operations and business requirements are hard for Agents to understand.

6. Integration is as Tricky as Ever 🧩

Agents excel at building highly functional, isolated components. But as any software engineer knows, building the pieces is only half the battle; stitching them together is where things break.

Integration remains incredibly tricky when relying on AI-generated components. The biggest safeguard we had? Robust integration tests. If you are leaning heavily on Agentic AI to write your components, your testing strategy needs to be flawless to ensure those components play nicely together in a unified system.

The Long-Term Horizon: Redesigning Infrastructure for Agents 🏗️ Looking beyond this single engagement, it’s clear that the industry is heading toward a massive paradigm shift. There is an enormous amount of hype about Agentic AI when it comes to coding, but the reality is that software engineering goes far beyond just writing syntax—it’s about how that code lives, breathes, and interacts with its environment.

Because of this, the role of the experienced engineer isn’t disappearing; it is shifting toward building AI infrastructure.

Today, there are multiple players in the market offering out-of-the-box AI capabilities. But the future points toward a hybrid model where companies consume these services while also hosting and managing their own internal agent ecosystems. This introduces an entirely new class of enterprise requirements:

Agent Skills Management: Registries and catalogs to manage what specific agents are allowed and trained to do.
Agent Compliance Testing: Frameworks to ensure autonomous agents adhere to corporate and regulatory standards before they act.
Machine-to-Machine Security: Establishing strict IAM and security boundaries for non-human actors.

The fundamental truth is this: Today’s infrastructure was designed to work with humans. Tomorrow’s infrastructure will have to be rebuilt to work natively with agents.

The Takeaway

Agentic AI elevates engineering teams. It handles the heavy lifting of legacy translation, boilerplate generation, and rapid prototyping, allowing humans to move faster. However, the role of the Senior Engineer is more critical than ever—shifting from purely writing code to acting as an architect, an integrator, and the builder of the infrastructure that will power the next generation of autonomous agents.

1. Demystifying the Legacy Codebase 🔦#

2. Prototyping at Light Speed ⚡#

3. The AI “Silver Bullet” is a Myth 🛑#

4. Permissions and Platform Shifts Require Human Eyes 👁️‍🗨️#

5. Technical Debt is Still a Human Decision 🧠#

6. Integration is as Tricky as Ever 🧩#

The Takeaway#