Sunday, August 3, 2025

Users lie. Don’t fall for it.


When conducting customer interviews, one of the classic errors I see product managers (PM’s) make is to ask customers what they want.  I cannot tell you how many times I’ve been in a feature validation meeting and the PM leading the session asks, “Would you use this feature if it was in the product?”


I’m sorry, but that’s a really bad question to ask.  Don’t do that.


Here’s the deal.  The users don’t really know.  They also have agendas.  They also forget things.  They also lie sometimes.


Many moons ago, I was working on a strategic infrastructure product.  As part of this product, we were trying to decide what features to add in what order.  Normal PM work.  One of the features we were thinking about was a disaster recovery (DR) feature. The team responsible for backup and DR was really excited about this feature.  They did a survey of existing customers that asked, “If we had this DR feature, would you use it?”  Not exactly that question, but pretty close.  50% of our install base said, “Yes, we would use it.”


Here’s the problem though.  We already knew that our market wasn’t spending money on DR.  We had revenue numbers that showed that our existing DR product wasn’t selling well.  This lit alarm bells on my team.  Huge argument ensues.  We spend MONTHS debating this.  Finally we decide to build the feature.


The feature falls flat on its face.  Nobody wants it.  We really struggle to get even one customer deployed.  Months go by and we finally do some proper research.  It turns out that all these customers know that they should do DR, but they don’t.  Auditors tell them they should do it, their exec leadership says they should do it, but when it comes down to actual planning, they just don’t make time and it falls off the work plan.  Thus, any DR solution has to address this critical time and staffing issue or it won’t get adopted.


Sigh.  Back to the drawing board. 


So, how do you learn from my costly mistake?


Customer needs vs. wants


You need to get away from stated customer desire.  In the end, it really doesn’t matter what they want.  Hell, I want to look like Hugh Jackman.  I don’t and I won’t.  I need to be healthy and exercise, so that’s what I actually do.  When you talk to a customer, it’s really easy to find out what they want.  They’re usually happy to tell you.  But this is just surface clutter.  The entire point of a customer interview is to dig down and understand customer need.  What is it that they need to be true?  Can you satisfy that need?  Can you make their pain go away?  If so, you have a customer for life.


In the previous example, they said they wanted DR.  Maybe they did, maybe they didn’t.  What they needed was for audit and senior leadership to get off their back.  So, we introduced a feature that automatically made their data resist a single Data Center failure.  Customers loved it.  They did no work and their auditors were happy.  The feature sold amazingly well.


Here are some ways you can dig down into need and away from want:


  • Always ask how it is done now.  How long does this thing take?  How much does it cost?  Who does the work?  Based on our proposed solution, would it be cheaper/faster/better than it is right now?  Knowledge of current state helps you understand your value prop.  You save them five minutes?  Meh.  You save them a million bucks?  Hell, ya.

  • Focus on them, not on you.  Who cares how your product works or what your roadmap is?  Worry about them getting promoted.  What can you do in the product to make this person a rockstar?  How do they get promoted for using your product?  Going into a customer meeting, giving a demo and walking out is a complete waste of PM time.  That should have been done by an SE.  Yes, you have to pay for their time by telling them what they want to know about the roadmap, but no, you’re not there to sell.  Don’t talk about you, your team and all the hard work you’ve been doing.  Ask about them and how they are doing.

  • Talk to the right person.  Do you know who your buyer persona is?  Your user persona?  Is this person one of those two?  If not, why are you talking to them?  Does it matter what someone who doesn’t use your product and doesn’t have the problem you are trying to solve thinks?  No, it doesn’t.  Make sure you know who you are talking to and why.

  • Ask them to show you.  One time when I worked for VMware, I was on site with a German retailer and we were discussing a new feature in a conference room for over an hour.  I just wasn’t getting it.  Finally, I said, “Who does this now?”  Dieter does this now.  “Where is Dieter?”  Dieter works down the hall.  “Can we go see Dieter?”  Yes, right this way.  Man, I learned more in the next 30 minutes than I had learned in a month.  Amazing.  Cherish those opportunities.  

  • Always force them to make a tradeoff.  Never ask, “Do you want A?”  Always say, “If you could have A or B, which would you choose?”  Assuming that B is a good feature, picking A over B means that A is also pretty good.  Of course, this test only works if B is actually good.  So, pick B carefully, just because you love B doesn’t mean that B is a killer feature.  Pick something you’re pretty sure the customer loves.  Make them kill their own children to get A.  If they’re willing to do that, then A is pretty good. 

  • Always ask why.  If a customer tells you something or states a preference, always ask why they have that preference.  So, if a customer says “I really want you to integrate with Jira,” you need to know why they want to integrate with Jira.  Is this about speed?  Cost?  Can you see a sample Jira ticket?  How many times has Jira integration come up this week?  What would happen if I don’t integrate with Jira?  Is there an existing corporate mandate that they must use Jira?  And so on.

  • Always aggregate.  A single customer, even a really big customer, saying that they want something is interesting but ultimately irrelevant.  If you find that a high percentage of customers have the same underlying pain points that you can resolve with feature X, now you have something meaningful.

  • Trust, but verify.  Anything that you’re told about user behavior is subject to misunderstanding, false reporting or other issues.  Don’t guess.  If they say, “We use feature X daily,” go check.  If you don’t have enough reporting in the product to know what features they use, go fix that first.  I have had multiple people tell me that features were highly valued by customers, only to find out in product telemetry that the feature never got used.


Of course, I’ve seen PM teams who have the opposite problem also.  Teams that completely ignore customer preferences and needs.  I’ve actually been told, “Nobody wanted the iPhone, but Steve Jobs built it anyway.  This is our iPhone moment.”


No.


You are not Steve Jobs.  You are not inventing the iPhone.  Just no.  


You cannot ignore reality.  If customer need isn’t there, the odds of you making the product successful are super low.  Yes, you could get lucky, but do you want to rely on luck?  There are millions of actual pain points inside customer organizations right now.  Millions.  Just pick one.  Solve the pain point.  Iterate.   This is the only way to consistently build software your customers want and will pay for.  



Wednesday, July 30, 2025

In Defense of the Stack Rank


I’ve been spending some time lately mentoring junior product managers (PMs) and helping them take the next step into more senior positions.  One thing that many of them are really stuck on is the global stack rank.  While everyone seems to understand the value of a backlog these days, remarkably few of the PMs I talk to have a regular stack rank exercise.  I find this distressing.


Let me start with a couple of caveats.  One: teams are different.  There is no magic one-and-only-one-way to build software.  While I really like global stack ranks, they aren’t right for everyone.  Two: I’ve spent the bulk of my career in large enterprise software organizations.  Most of my leadership experience deals with larger teams.  My last major product had over 150 engineers and a dozen PMs.  The advice below stems from this background, so take it into account when reading.  


What exactly is a global stack rank?


A global stack rank is a high-level list of the epics you would like to cover in priority order.  It is similar to a backlog in that it sets overall priority but it is not a backlog because most of the actual work isn’t included.  A global stack rank needs to be super high-level.  Depending on how you manage your work, this may mean including only epics and not stories or tasks.  When I worked on VMware Cloud on AWS, the project was so large that we had hundreds of epics each quarter.  So, we started using Jira initiatives and stack ranked only the initiatives. Do what works for your product team. .  


I strongly prefer that the global stack rank be composed of Jira epics or initiatives because that keeps the process grounded in actual work and you can then track team backlogs to this stack rank.


Why do I need a global stack rank?


The main reason is to help engineering plan.  When you are managing a small project with less than 20 engineers, you can usually get everyone in a room and hash things out.  If you only have one scrum team, you will wind up talking about this stuff at sprint planning.  What if you have two scrum teams?  Six?  Twenty?  At some point, you can’t do that any more.


So, you need some way to clearly communicate priorities and make sure that things like cross-team dependencies are prioritized appropriately.


Let’s say you have three teams.  Team A has a deep backlog of 100 items.  Inside that backlog is tons of work that is critical for them to get done.  They also have one item that team B is waiting for and one item that team C is waiting for.  Team A does their sprint planning and starts to groom their backlog.  How high on the backlog are the items for teams B and C?  Well, they go ask B and C.  Wanna guess what B and C say?  Yeah, they both say their items are extremely critical and top of stack.  Now what?  Well, they go to PM, of course.  PM has to make a trade off call.  What happens if A, B and C all have different PMs?  Now, they pull in their boss.  Let’s say that team C has a different boss than A and B.  Now what?  Well, now they call in the Senior Director.  That’s me.  Why am I getting pulled into a debate between A, B and C?  It’s because I didn’t create a clear set of priorities for the team that allowed them to self-serve.


How do I set a clear set of priorities?


You guessed it, create a global stack rank.  


How do I create a global stack rank?


This is the very interesting part.  My guess is that if you haven’t done this already, creating a global stack rank will be difficult.  That is because you probably don’t have clear priorities across the team.  You probably don’t have clear decision criteria defined for how features get evaluated and prioritized.  Thus, decisions are made without clearly defined success criteria.  In many cases, it’s HIPPO (Highest Paid Person’s Opinion).  That’s a really poor decision making process.


What you need to do is step back and look at your larger goals.  Where does the product need to be in a quarter?  In a year?  In three years?  If you are moving fast, having three-year goals may be a stretch, but I would encourage you to set goals at least one year out.  You may be completely wrong about them, but the process of debating and setting those goals will tell you about yourself, your team and your product.  You can always change them if you aren’t happy with the result.


Now that you have an annual goal, you break that down by quarter.  This means that certain things need to be true by the end of each quarter.  A quarter is about 13 weeks; realistically that means you’re doing about six two-week sprints per quarter.  You have six chances to make something true by the end of the quarter.  If you have six things, that’s one per sprint.  Of course, things are not that neat, but you get the idea.  You want to have methodical regular progress.  You need to push things into production to test your theories so you can measure and adjust.  Thus, your stack rank needs to support those things.


Let’s say I had an annual goal of reducing customer churn by 25%.  The PM team does their research and the theory is that there are three main causes of user churn: feature gap to competitors, difficult onboarding process and lack of reporting.  Further analysis is done and it turns out that addressing the feature gap is a massive pile of work.  You break that work down further.  You now have three new features, a new onboarding workflow and a reporting feature.  Five features.  You could just put all the feature gap work at the top, but then you get no work done on reporting which is important although not very sexy.  So, you put your top feature gap at top of stack, then user onboarding, then reporting and then the other two feature gap features.  Of course, it’s more complicated than this, but I’m just giving a rough example.


It turns out that reporting is a complex cross-team initiative.  Like security or tenancy, it touches everyone.  Thus, it requires huge amounts of cross-team dependencies.  Without the global stack rank, I can guarantee that all three gap features would get done way before reporting.  Features like that tend to languish because of the cross-team complexity.  However, if reporting is stack ranked above the second and third compete features, you can then review each team’s backlog and ask, “Why is this task related to global stack rank four (your second compete feature) ahead of global stack rank three (reporting)?”


Now you are having a meaningful conversation with the scrum team about their backlog with the correct strategic and business context.


Does a global stack rank imply that features ship in that order?


No, certainly not.  Some features are low on the stack rank but very easy to do.  If I see something that my team can do in one sprint, I may just go ahead and knock it off to get it out of the way.  Similarly, if I am blocked by another team, I’m going to go deeper in the stack rank until I get unblocked.


Similarly, let’s say that of all the work in the global stack rank, the very first item that I can work on is global number 100.  Well, that’s my number 1 so I go work on that.  Perhaps I get it done quickly, before global 1 or 2.  That’s OK.  This is about helping teams make good tradeoff decisions, not about forcing them to only work on the stack rank in order.  



Wednesday, July 16, 2025

Three Lies and the Truth: Silicon Valley’s Biggest Lies

 

Living and working in Silicon Valley for most of my adult life has allowed me to view the industry from very close range for almost thirty years.  While in general I am thrilled to be in this business, there are definitely some strange myths that run around the valley.  You hear them repeated from time to time online and I always just shake my head.  I’m not really sure if people believe these things or they’re just saying things to drive engagement.

Here are my top three Silicon Valley lies.

Lie number three: First mover advantage


You hear all the time that getting to a market first is a massive advantage to the early entrant.  Thus, pressure is placed on teams to get in early and ship anything as long as you ship early.  This results in some really crappy code out in the wild.  


If you look at the history of software categories, it’s actually uncommon for first movers to dominate the category.  Think about things like smartphones (Apple was VERY late to this market) or web search (Google was after Yahoo and Alta Vista).  Xerox basically invented the modern computer (GUI, Mouse, Ethernet, etc.) but failed to capitalize on it.  In most cases, the first movers don’t win.


For early and emerging markets, there are so many unknowns that it’s just luck if you happen to pick the right product strategy.  I believe in moving quickly, but my focus is always to ship a quality product as quickly as you can.  Thus, my advice is always the same:  Focus on quality and velocity, not on being first to market.  Don’t ship crappy software; your users will forget who was second to market but they’ll remember your horrible bugs and crappy UX.


Lie number two: Innovation leads to success


Coming back to Apple for a moment, there is a myth that Apple has been successful in the market because they are an innovative company.  This leads to conversations about how innovative companies are as if this is actually something that you should care about.  First, Apple isn’t actually that innovative, and innovation is not their key strength as a company.  Apple’s key strengths are a good design ethic and a focus on end-to-end customer problems.  If you think about the iPod, it was iTunes that made it work.  MP3 players were a dime a dozen in those days but the iPod had a great design, looked amazing and had a full end-to-end solution via iTunes.  This made Apple successful.  They then replayed that with the iPhone and the App Store.  It’s about the end-to-end experience.


Similarly, Google derives the vast majority of their revenue from advertising.  However, Google Ads really became dominant after the DoubleClick acquisition.  They also acquired Android and several other key technologies.  Does this mean Google is somehow “less” than other companies who build all their own tech?  No, Google is a money-making machine and a Silicon Valley icon.  It really doesn’t matter if they build or buy their technology. What matters is what they do with it.  Google made good acquisitions and was able to operate at a huge scale, which made them dominant in the market.

Lie number one: Ideas are rare and valuable


People think good ideas are very rare.  They’re not.  Ideas are cheap and plentiful.  Execution and delivery is rare and hard.  


Guillaume De Saint Marc (GSM, Cisco)



Almost everyone I talk to in Silicon Valley has a great idea for a startup.  If you hang around after work over beers, it’s common that this is the subject of conversation.  So there’s no lack of ideas.  However, turning your idea into a business is really really really hard.  It also requires an almost maniacal focus to really make things happen.


In fact, being a founder is so amazingly difficult, it is hard for founders to shift gears after their startup succeeds.  Hence the debate about “founder mode” and how long you can keep maniacally focusing on one thing instead of managing your people and running a business.  The very idea that someone who works in a medium to large scale company can suddenly take ownership and operate in “founder mode” is just bizarre to those of us working in these organizations but seems normal to founders and VC’s.


For product managers (PMs) like me,our job is primarily to sift through and evaluate ideas.  I have plenty of junior PMs who think that their job is to have great ideas.  That’s not really the case.  If you do your homework you will quickly find out that all the good ideas you need are already there. All you need to do is listen.


This is why hiring PMs based on their brilliant product ideas is not a great strategy.  You need PMs who can sift through the volume of ideas and focus on the ones that will engage and delight your customer.  Execution on a PM team is really hard and requires tedious research and effort.  The ideas are nice, but they’re not going to make or break your team.  Focus on the customer.  Solve their problems.  Move quickly to deliver high quality code.  Iterate.  If you can do those four things, you will have success as a product leader.  

The truth


The truth is that teams who ship at high velocity and quality win.


That’s it.  Move fast and produce a high quality product.  The odds are that if you can do this, you will be successful in the long term.  You don’t need amazing insights; you don’t need to have Steve Jobs levels of disruption; you don’t need to invent new computer science.  If you are moving fast, mistakes get corrected fast.  If you are shipping with high quality, your customers will stay loyal.  Operations, focus on the customer, and a high quality product—these are the things that drive success in the market.


Friday, July 4, 2025

AI is the New Outsourcing



One of the truly fascinating things about the tech industry is that despite the industry’s obsession with innovation, you continually see the same pattern happening over and over again.  I refer to this as “watching the movie.”  It’s pretty common for someone to excitedly tell me about a brand new concept in the industry only for me to realize that I’ve seen this movie before.  Sometimes, several times.  I’ll even tell people, “hey, I’ve seen this movie before.”  Because I’m an old guy, I am expected to say stuff like this.  Recently, I’ve realized that I’ve seen the “AI changes everything” movie before.  When I tell people this, they’re usually surprised or dismissive.  However, if you pay attention, you’ll see that AI is falling into very familiar patterns just like other technologies that were supposed to “change everything” but didn’t.


For example, when cloud computing came out, it was explained as this amazing innovation that had never been done before.  However, it’s basically just hosting with a better UI.  We’ve had centralized Hardware-as-a-Service (HaaS) since the mainframe and time-share days.  So, not really new, just a new way to do the same thing.


Today, we have people saying that AI will replace all your engineers—that you should allow AI to take over software development for you.  This should sound familiar.  We used to call this “outsourcing.”


The original premise of outsourcing was that you would take functions from your org that were low-value things and ship them off to a low-wage country.  The idea was to lower your labor costs by taking work that didn’t provide strategic value and sending it offshore.  What actually happened is that companies wound up building entire subsidiaries in places like India.  As a result, wages rose in India and now offshoring to India isn’t as attractive as it used to be.  Now that we’ve been doing outsourcing for quite a while, we know what the issues are:


  • Communication.  It’s really hard to ensure that all the teams involved are talking efficiently and that context is shared clearly between teams.

  • Quality.  Is the outsourced team doing good work?  Who is evaluating the quality of that work?

  • Context.  Is the outsourced team doing the right thing?  How do you adjust their work over time?


Common advice about outsourcing goes something like this:




These points are amazingly similar to what I would tell you about using AI to build your software.  In the end, moving work outside of your core group is always difficult and needs to be managed.  For AI, specifically:


  • Know what to give to AI.  AI is really good at some things like summarization but terrible at things like creative problem solving.  Pick AI tasks appropriately.

  • Context is king.  When you use AI to do work for you, it’s important to have detailed context written down.  Requirements, standards, and preferences matter for AI.

  • Try smaller tasks.  It’s about iteration.  Small teams, small projects and iteration are the key to success.

  • Pull the AI into your existing collaboration environment.  Similar to a junior employee, you want AI to be engaged with your team.  This means that you still need things like Slack, Jira and GitHub.  Integrate your AI solution into your existing toolchain.

  • Coach the AI system to drive results.  While a one-shot prompt makes a good demo, in almost all cases, you will have to sit down and critique the AI’s work and provide feedback. It’s extremely unlikely that the AI will get it right the first or even second time.


While I don’t believe that AI is up to the task of creating your application from scratch, it does provide an amazing productivity boost to your team.  For this reason, it’s important that you manage AI just like any other outsourced function or team member.  This ensures that you are getting the best out of AI just like you want to get the best out of any team member. 


So, yes.  AI is amazing and new.  However, it is also falling into familiar patterns.  As you watch this movie unfold, think about other similar movies you’ve seen and watch these patterns repeat.


Thursday, June 26, 2025

Sink or swim: Making the call



I’ve been getting requests lately for guidance on how to become a product management leader.  While there are tons of folks out there who talk about how to be a product manager (PM), there are relatively few people talking about PM leadership and the difference between good PM organizations and great PM organizations.


I’ve been in the software business since 1996 and I’ve been a PM for over 12 years including my current job as Senior Director of PM.  When joining a new PM organization, I focus on a couple of things:


  1. Does the organization ship the right thing?  The PM team needs to be focused on the what and the why.  What are we going to build and why?  When I look at the current roadmap, I want to know why they have that roadmap.  Asking the why question will tell you if they have a good decision making process.

  2. Is the organization growing?  I don’t mean adding headcount.  I mean are they learning from their experience?  If they make a mistake, is that mistake recognized, addressed and corrected?  In many floundering organizations you will see failures covered up or blame games being played.  Neither of these things is healthy.

  3. Are they focused on the customer?  When decisions get made, do they focus on internal issues or are they focused on the customer?  In my experience, an amazingly high number of product decisions get made inside the company with little reference to the actual customer.

  4. Are they data driven?  This is related to #3, but when they make decisions, are these decisions backed up by good data?  If you think you know what’s going on but don’t bother to measure it, you’re probably wrong.


So, you’ve joined a new team and you’re not happy with where they are.  Where do you start?


At the top of the list, of course.


In the end, a PM organization is a decision making organization.  Unlike engineering, PMs don’t need to focus on delivery.  Yes, we are involved in delivery, but no, we are not the ones writing the code, creating the marketing copy, etc.  Our job is to make the call.  We sink or swim based on our ability to make good decisions as a team.  We do this in big ways by deciding to take on new products, but we also do this in small ways by tweaking user stories to help get a feature out the door.  The daily decisions involving small trade-offs are at least as important as the big “let’s build this new thing” decisions.


An unhealthy organization makes poor decisions.  Those poor decisions are violently defended because poor organizations punish people for being wrong.  Thus, even if they are wrong, they’ll claim they’re right.  


When my daughter was small, we knew that she was REALLY tired and ready for bed when she loudly proclaimed she was not tired.  The more strenuous her denial, the more likely that she was overtired and should have been put to bed already.  


Same thing with product teams.  Ask any product team, “Why did you make that decision?” and you can tell an amazing amount about them just by the tone of the answer.  Are they defensive?  Bad sign.  Do they have specific evidence?  That’s a great sign.  Are they highly introspective and self-critical?  Even better.


In the vast majority of organizations I have worked for, specific decision criteria were rare.  What I mean by this is you should always know the basis for a decision.  That decision criteria should be openly discussed in advance.  I am amazed at how often people don’t actually know how a decision will be made.  I have sat in countless meetings debating an idea only to find out that nobody agrees how we will decide.  What happens is that everyone in the room states their opinion.  However, since we don’t know the basis for the decision, all those opinions are meaningless.  


Try asking things like, “If we build this feature, what is different six months from now compared to today?”  Or “How will we know next quarter that this was the correct decision today?”


Usually, the team can’t answer questions like this because they haven’t really thought things through, and they don’t really know why they are building what they are building.  It’s your job as a PM leader to focus on these “why” questions.  Why are we doing this?  How do we measure success?  


Here are some quick techniques you can use to help get the team focused on the correct decision criteria:


  1. Focus on measures.  Any claimed benefit must have a measurable result.  If a team member says, “Customers want this feature,” always ask, “How do we measure that desire?”

  2. Focus on outcomes.  Teams can get caught up in things like stack ranks, feature lists, and bugs but it’s the outcome that matters.  Questions like, “What will be different next quarter if we do this?” help drive the team to focus on positive outcomes.  For example, I would expect an answer like “we expect workflow abandonment to drop from 30% to 10% as a result of this change” or something similar. Specific, customer focused and measurable is the goal here.

  3. Encourage bad news.  Bad news travels fast in healthy organizations.  If someone brings you bad news, don’t shoot the messenger.  You want to encourage them to come to you.  “Thanks for bringing that up.  It sounds important; let’s get into that detail in our one-on-one” is a great response.  Don’t let the meeting rat-hole but recognize that the bearer of bad news is trying to help.  For example, when customer usage of a feature is super low, that’s bad news and you have to talk about it.  But if your PM messed up and wrote the requirements wrong, you don’t want to discuss their failure in a team meeting.  Corrective feedback to your PM is best in a 1:1 setting. 

  4. Hold them accountable.  I tend to be a “praise publicly, critique privately” kind of manager.  This means that during one-on-ones, you should be discussing outcomes you expect and giving direct feedback when you’re not getting that outcome.  I don’t advise dressing down staff in a large meeting—it tends to make people defensive.  The tone you should set in a group meeting is “how are we going to achieve this goal?”  Focus on the team.

  5. Don’t manage the product.  You are no longer a product manager.  The product managers who work for you need to manage the product.  It’s tempting to just change the stack rank or talk to engineering yourself, but that’s almost always the wrong answer.  Work with the team, give them coaching, but don’t do the work yourself.


In the end, it’s your job as a PM leader to build a good team and let them do their job.  If the team is making good decisions, you have succeeded in your most important task as a PM leader.


Saturday, June 14, 2025

Flying Without a Net: Requirements to Code (via Codex)

 


In my previous blog post about Jira automation and Claude, I created a sample application and was able to investigate the code on GitHub to compare it to the PRD that I wrote.  Based on that analysis, I had Claude write several Jira epics.  One of them was pretty generic about implementing a user authentication and profile system.  Because this functionality is required for almost any application you might write, I decided to start there.


Here is a screenshot of the epic that Claude wrote:



Note that this epic is HUGE.  If a junior PM had written this epic, I would advise them to trim it down a bit.  Pull out things like Google OAuth from things like RBAC, for example.  This could be done by making smaller child stories against this epic or by breaking it into multiple epics.


However, for the purpose of this test, I just passed the output of Claude into Codex. Codex is the relatively new GenAI-based code tool from OpenAI. Here is the summary Codex produced after attempting to implement the epic:

Codex purports to be really good at building code directly from requirements.  In the demos they give, you can just toss complex requirements at Codex and it will do the heavy lifting.  Notice that tossing Codex a random GitHub repo may also mean that it cannot test because it doesn’t understand the code base.  In this example, lint failed because of a missing dependency.  Codex didn’t detect and fix that automatically nor did it suggest a way to fix it.


In addition, as you can see from the output in the example, Codex only focused on part of the epic.  It’s implementing just the unsafe handling of API keys and such. which Lovable hard coded into the source code.  To be fair, this is a pretty important issue and definitely should be addressed in the code as soon as possible, but it’s a very small subset of the epic.  Codex didn’t come back to me and say, “Hey, this epic is way too big, please make it smaller.”


Again and again we see this failure mode in GenAI systems.  They are enthusiastic but not experienced.  If you compare this to people, a very junior dev might just follow instructions, not knowing how bad things are or how epics should be written.  A more senior dev would go back to PM and tell us that we need to break this work down into smaller chunks.  A principal-level dev would just fix the epic themselves and tell us that they fixed it.


Please note that I’m looking at this from the product management perspective.  I won’t evaluate the quality of the code coming out of these systems. I’m just investigating how functional they are just like I would evaluate any eng partner I am working with.


In the end, a feature team only needs two things to be successful: quality and velocity.


If you are delivering on epics at a very high quality and doing so very quickly, almost any other problem can be addressed by PM.  Assuming PM is doing their job, this means that we are building the correct features and that the product is solving problems for the target persona.  The same thing goes for AI.  We know that GenAI-based systems like Codex are much faster than traditional coding methods, but are they executing at high quality?


So far, the answer is no.  They require close human supervision to make them work correctly.  


Going back to our junior employee example, this shouldn’t be surprising.  If you hired a dozen new college grads and let them loose on your code base, what do you think would happen?  Yes, chaos would ensue.  At the moment, the same is true for AI-based toolchain.  You can get them to do an amazing amount of work for you but you do need to supervise them and monitor their progress to ensure quality work.


GenAI: Eager, fast, well educated.  Not experienced or self critical. 


Why Autonomy Doesn’t Matter (Yet)



As I’ve discussed before, I am not terribly concerned about how autonomous my AI agents are.  Most of what you read online focuses on the autonomy aspect of Agentic AI and I really think that’s the wrong approach.


My background is in enterprise-class software. Specifically, enterprise infrastructure.  I’ve been working on enterprise-grade automation since 1996.  In the end, an agent is simply an automation platform.  You are asking the agent to do work for you.  The advent of LLMs means that there are entire classes of work that computers can do now that we couldn’t dream of in 1996, but the core business problem of ensuring that the computer does the work for you remains.


If you think about any automation project, the first question is always the same.  Will the system be accurate?  That is to say, will it achieve the business result?  


The very first production system I developed and deployed was a system that automated email accounts.  The business result was that everyone who worked for the company had to have a working email address and that email address had to be mapped to the correct server where their mail was provisioned.  Simple to say, but difficult to do for 100,000 people.  Later, I built a system that provisioned Windows Servers at scale.  Automated provisioning wasn’t really a thing back then and we had to build a complete running Windows Server host from bare metal in just an hour.  This used to be manual work.


As a PM, I worked on systems like DRS, which automatically places VMs inside an ESXi cluster, and HashiCorp cloud, which automatically deploys customer environments.


Etc. Etc. Etc.


Over time, technologies change.  The techniques we use change.  But the business goals, the process and the underlying issues remain evergreen.  The system must solve the problem, and it must solve the correct problem at the correct time.  An agent, by implementing a business process, is simply another, more modern, automation platform.  It’s no different in concept than software that deploys servers or places VMs correctly.  Thus, the underlying problems are the same even though the implementation is completely different. 


For a modern LLM-based agent, there are two primary concerns:


  • Context.  The agent must have the correct context.  When solving a business problem for the user, the context of that problem is critical.

  • Accuracy.  If the agent claims to have solved the problem, that problem must be solved for the user a significant percentage of the time (probably 95% or better).


Yes, but what about autonomy?  Does the agent solve problems on its own?


It turns out that autonomy is a byproduct of context and accuracy.  If the agent is very accurate and has the proper context, then you will allow the agent to solve the problem.  However, this only occurs AFTER you have confidence in the accuracy and context of the solution.


Let’s take a hypothetical example.  Let’s say you are running a business and you decide to buy an agent that approves home loans.  The purpose of this software is to evaluate each loan, apply the company’s loan standards and either approve or reject this loan.  There are two vendors who have loan approval agents; you have to decide which one to buy.


  • Company A has a “master agent” loan system that takes each loan and automatically approves or rejects the loan.  You give it a document describing your policies and it takes all further action.

  • Company B has a “loan automation” system that investigates your current process, documents it where necessary and then makes loan recommendations.  Those recommendations can either be manually approved by a loan officer or automatically approved.  The default is manual approval.


Which company do you hire?


Of course, you hire company B.  Company A has too much risk and there is no way to manually intervene.  Company A may have an amazing system, but you don’t know for sure how well it will work in your environment.  On the other hand, Company B allows you to start out manual and then automate later.  Company B also has a way to discover your process which may be different than what’s actually documented.


And here’s the thing.  When I was a vSphere PM working on the DRS feature, we had the EXACT SAME PROBLEM.  When DRS was initially released, we were very confident that the VM placement decisions that the system made were correct.  We had done YEARS of testing and we knew that we were better at placing VMs using this system than when humans placed VMs.  We had papers about this, we had patents—all kinds of stuff.


And what happened?  Customers balked.  They didn’t know what was happening so they didn’t turn the system on.  So, we always lead with “Manual” mode where the system would make recommendations but not actually make changes.  Today, there are actually three modes: “Partial” for initial placement only, “Full” for complete automated placement and “Manual” for recommendations only.  The vast majority of customers start with Manual and most of them eventually move to Full (automated).  DRS today is one of the most widely adopted vSphere features.  vSphere also introduced the idea of VM overrides and host affinity.  This is context that allows the system to make better decisions by letting it know that VM1 and VM2 need to be on the same physical machine or that V3 cannot be vMotion’d.  


The details of how vSphere works aren’t terribly important here.  The point is that these types of accuracy and context issues have been around for a very long time.  We can look back at these systems and understand how they used context to improve accuracy and how those two factors led to customer adoption.  It’s easy to think “GenAI changes everything” and just ignore the last thirty years of enterprise automation, but that would probably be a mistake.  We know how to solve these problems, we just need to look at them in the abstract and pay less attention to the implementation details which change over time. 


This takes us to context.


The lesson of the last 30 years of automation is that context is king.  If the system knows what is happening and it knows what’s supposed to happen, the odds are higher that the system will take the correct action.  Yes, context leads to accuracy which leads to autonomy.  This is yet another software virtuous circle.


As you plan your AI agents, think about context.  Does the context that the agent needs exist already in an online system?  Is that context correct?  Are there secret rules that your business actually uses that aren’t written down?  Start there.  If I am a very junior employee and I know nothing, will I do the right thing if I just follow the documentation?  If not, your agents don’t have the correct context and won’t reach the correct result.