Perspectives from the Field: May 2025

Saturday, May 31, 2025

Why You Need an AI business Plan

As part of my work on AGNCY, I’ve been working with literally dozens of organizations working on Agentic AI. At this point, it’s not really a question of if agentic will enter your organization but a question of when and how. Last week, both Google and Glean announced their enterprise class agent platforms.

Regardless of your platform choice, you are going to have the ability to build agents relatively quickly. This means that eventually you will have hundreds if not thousands of them running in your organization. This is not hype, this is just a fact.

The real question then becomes, how do you ensure that these agents are doing what they are supposed to do, and how do we know if they are actually providing value to your organization? Yes, some questions are truly evergreen. This is exactly the question that every single technology innovation has to answer. The buzzier the technology, the more important this question becomes.

Years ago, I was heavily focused on cloud for enterprise before that was really a thing. Whenever I got into a meeting with a CIO or other IT executive, it was super common for them to tell me all about their “cloud strategy.” I would patiently listen and then ask, “What are your business goals for this initiative?” Many times, they could not tell me, which was concerning. The most common answer was, “to save IT costs” which was even more concerning because I knew for a fact that cloud wasn’t cheaper for most customers. I wound up writing an entire book on this topic (Why We Fail), to help customers figure out what they wanted to do and how to manage the implementation of cloud from an IT business perspective.

I meet with customers frequently about GenAI, the Internet of Agents, and related topics. Almost all of them have an “AI strategy.” So, I ask them, “What are your business goals for this initiative?”

Aaaaaand, they don’t know.

Sigh. Here we go again.

I’ll probably wind up writing another book. In the meantime, here’s the compressed version of this discussion.

Just like anything you do in your organization, any investment in GenAI, agents, or any other related technology must support a business goal. Technology for technology’s sake is completely pointless. You’re just lighting your money on fire. Start with a specific problem. Focus on the outcome. Measure the result. Repeat. This is the way.

For most of my customers, they have started out with GenAI-based chatbots. This makes sense because LLMs are uniquely suited to building chatbots. We have had chatbots for some time and it’s become pretty common for them to be deployed as customer-facing agents. Of course, because this is very common, it is also adding very little competitive advantage. If everyone is doing it, you’re not distancing yourself from the competition, you’re just treading water. Stop treading water.

To truly gain business advantage from AI, I strongly suggest that you pick out classes of problems that are currently difficult to solve with traditional software. Pick some sample projects, try the technology out. Use that experience to guide your organization. At the moment, AI has some specific characteristics that make it good at some problems and not others.

AI does not really reason. It attempts to find an answer based on training data. LLMs literally predict word by word the correct answer to the prompt they are given. This makes them good at answering abstract questions and allows them to interact with users in natural language.
AI is good at summaries. You can take a large unstructured dataset like a white paper or an instruction manual and ask the LLM to summarize. The results are often quite good.
AI is not linear. What AI is NOT good at is producing the same result over and over. Given the same prompt, the result will vary over time. This is good for things like chatbots but terrible for things like banking transactions. Focus on problems that don’t have a “correct” answer.

All of the lessons we learned about SaaS and cloud work for us here. The AI state of the art is moving VERY VERY fast. This means that formally evaluating tools and having a single company standard just doesn’t work. Anything you choose today will be completely obsolete in six months. So don’t try. Get small teams focused on solving tractable problems and iterate. You will need to move quickly if you want the result to mean anything.

In some ways, an agent is similar to a microservice. It needs to have a discrete business outcome and it needs to be something small enough to be developed by a single scrum team. An agent should have a stated goal, a set of inputs, and a list of dependencies. It also MUST MUST MUST have evaluation criteria.

The evaluation criteria is the actual tricky bit. It’s easy to say, “The custom keep warm agent should send customized follow up emails to customers who haven’t logged in for two weeks” or something similar. That’s a business requirement. You can also pretty easily say that this agent needs access to say, SFDC, Gmail, and the ops database to determine the last login. All good so far. How do you know if the agent is doing a good job?

Well, you have to read the emails. Do they make any sense? Are they more likely to provoke a positive customer response than the boilerplate template you are using now?

And here is the really interesting point. Any AI agent you build is essentially a very junior employee you hire. They need to be managed. They need to be coached. You have to watch them. Within the Outshift team at Cisco, we say “agents are the most enthusiastic interns you have ever hired.” If you interact with LLMs regularly, you know what I mean. Always eager to help, not super experienced in how your business works.

So, how are you going to manage this junior employee? How many juniors can you manage successfully? How do you compile and provide feedback to them? Under what circumstances will you terminate their employment?

For people, we have process, policy, and training to handle all these things.

Do you have all that for AI?

You need it.

This is what goes into your AI business plan.

What is an AI Agent?

As I discussed in a previous blog post, one of the major focus areas of my team at Outshift by Cisco is AGNTCY which is an open-source collective focused on Agentic AI and the future Internet of Agents. As part of this work, we often get into debates about things like the term “agentic” or even defining the term “agent.” Now that our work is hitting the mainstream, we are seeing this debate all around us.

For example, we have people telling us that agents absolutely cannot work:

Notice that the author complains about “agentic” without defining it and then talks about “AI pilots” which aren’t necessarily agentic at all. Not too surprising for something moving as quickly as AI, and specifically agentic, which is probably the buzziest of AI terms at the moment. Thus, you hear these terms on a daily basis. However, everyone has their own idea as to what these terms mean.

For example, NVIDIA describes agentic as follows:

The next frontier of artificial intelligence is agentic AI, which uses sophisticated reasoning and iterative planning to autonomously solve complex, multi-step problems. And it’s set to enhance productivity and operations across industries.

https://blogs.nvidia.com/blog/what-is-agentic-ai/

Which is interesting, but also very restrictive. What if the problem does not require iterative planning? Is it not an agent? What if we want the agent to check in with us to give a proposed solution? Agent?

In a similar vein, IBM says:

Furthermore, agentic AI takes autonomous capabilities to the next level by using a digital ecosystem of large language models (LLMs), machine learning (ML), and natural language processing (NLP) to perform autonomous tasks on behalf of the user or another system. A gen AI model that has garnered much attention is ChatGPT. While this product offers similar creative abilities to agentic AI, it isn’t the same.

https://www.ibm.com/think/topics/agentic-ai-vs-generative-ai

However, Microsoft has a much broader definition:

Notice that they include retrieval and action. This is much more generic and aligned to what I actually see happening with my customers. Everyone starts with retrieval. Then they move to recommendations. Only after they know the recommendation has high accuracy do they move on to automation.

So, as we think about “agentic” we must first think about “agents.”

On the one hand, an agent is simply software that takes action which is abstracted away from the implementation of that action. So, I ask an agent to do X but I don’t have to know how X is done, I just get the result back. Of course, I’ve just defined a microservice as well. Fundamentally, microservices are abstracted functions I can consume that do things without me having to worry about how they do them.

If you read the definitions above, you see several conflicting ideas but there is a strong theme around the “agents take actions without human supervision” thing. If we look at the history of automation, we know that automation systems always take time before they’re mature enough to take actions on their own. For this reason, I understand the skepticism that people are expressing online. If you define an agent as something that takes action independently, we clearly have work to do. If you define an agent as something that takes autonomous action, then we are clearly not ready for agents yet.

However, this definition is severely lacking and way too restrictive. If you only define an agent as something that takes action on your behalf, you’re missing out. Let’s say you had software that, given your stock portfolio, made recommendations about which stocks to buy. Or say it asked you a series of questions and then came up with a custom stock recommendation. I think that most would agree that it is an agent. But it doesn’t buy stock, so it doesn’t take the action. Still an agent though. It’s the fact that it does discrete work for you that matters, not the degree to which it takes action or not.

So, what is different between agents and microservices?

Well, one could say “not much” since they are mostly the same thing. Which is fair, but I think is also missing the point.

I would argue that the difference is subtle but profound. The difference is that microservices (and most APIs) accept highly structured data in and produce highly structured data out. An agent, however, accepts semi-structured (or even unstructured) data in and produces semi-structured data out.

For example, if we wanted to build a stock picker app as a microservice, we could certainly do that. Many apps do exactly that. You would build a microservice (or more likely a series of them) which took input like “Customer Risk Tolerance Score” or “Customer Total Portfolio Asset Classes” and then you would construct a recommendation app from there. On the other hand, an agent based solution would focus on a prompt: “Based on me being 55 years old and having a current portfolio heavily weighted to small cap stocks, what four index fund buying recommendations would you make?” This would be turned into a semi-structured call to the agent. The actual details of this vary, but these days this is usually also an API call with some sort of semi-structured format like JSON.

And this is what’s interesting about things like A2A from Google or ACP from AGNTCY (which are both JSON). They recognize that these systems are focused on these semi-structured or unstructured data calls and thus they focus on the semantic layer of the application. Under the covers, they’re just REST sitting on HTTPS with encapsulated JSON just like we have been doing for ages. The payload of the JSON can be unstructured (i.e. the prompt in the example above) or it can be semi-structured (i.e. a list of stocks I should buy in priority order). But you normally don’t give the agent a highly structured set of inputs (i.e. a long list of value key pairs) which is what is usually in the JSON or XML passed in an REST API call. It’s fascinating to me that the dominance of REST over older call types like SOAP has allowed us to have a very strong infrastructure for passing text all over the internet. All of the “important” features we used to have in the 1990’s like strong types, bounds checking, etc. are no longer needed in the Internet of Agents era. It’s as if we have been working towards unstructured for twenty years and we all suddenly understand WHY we’ve been doing that.

So, if we agree that agents are focused on semi-structured or unstructured data, then what is agentic software?

I would argue that agentic software is any system composed of agents. That is to say, that agentic architecture is fundamentally about composing subsystems without the traditional highly-structured internal architecture that things like microservices use. This is VERY different from what we did in the past. It creates all kinds of interesting new possibilities, but it also causes all kinds of problems like the lack of a clear test plan and inconsistent result sets. The security implications of all this are completely off the charts and there will likely be an entire industry securing these agents over time.

That leaves us with the following definition:

An agent is a subsystem that can take unstructured or semi-structured inputs and perform work on behalf of the caller without exposing details of how that work is performed. Agents can be based on LLM’s or other GenAI technologies, but their actual implementation detail is irrelevant to this definition. Agents can retrieve information, propose actions or take actions on the caller’s behalf. Agents may or may not be autonomous in executing those actions.

This definition allows us to capture the essence of what is different about agents without committing ourselves to allowing agents to take completely autonomous actions. Autonomy may or may not happen, but agents are a very useful construct, similar to microservices, but more flexible.

Thursday, May 29, 2025

Why AI Toolchain Matters (and a first look at Replit)

As part of my work on AGNTCY with Outshift at Cisco, I was responsible for developer experience. Thus, I spent a huge amount of time using various AI-based dev tools from various companies. Normally, those reviews stay internal to Cisco, but I’ve decided to start publishing them. Please let me know if you like this type of content and I’ll produce more of it. Keep in mind that these are just my thoughts, and they do not represent Cisco or any other organization.

AI: Hero or Villain?

At the moment, there are a couple of major themes playing out in the industry around AI. They are, not surprisingly, opposing theories of how this will all play out. One theme is “code is dead, abandon all hope ye who enter here.” For example, see the following article in which Nvidia CEO predicts the death of coding:

https://www.techradar.com/pro/nvidia-ceo-predicts-the-death-of-coding-jensen-huang-says-ai-will-do-the-work-so-kids-dont-need-to-learn

The theory is that the better AI gets, the less relevant code becomes. Which is a compelling argument. In some ways, code is just a bug. We write code because computers are really dumb. We have an entire industry focused on helping computers do really simple things like make a phone call or add 2+2 — things humans learn as children. Given the choice, most people would prefer to just talk to their computer and tell it what to do in normal human language. So, this theory makes sense.

However.

Human language is notoriously imprecise. I could tell you, “Get the thing and finish up.” For a human, this sentence makes sense and the odds are a human would get what I wanted based on context. Computers aren’t like that. LLMs change this equation, but at the moment they’re not as good as humans when it comes to context. This means we need to be much more precise when talking to computers. We need to tell them EXACTLY what to do. Or in other words, we write code.

This leads us to the second theme in the industry: AI toolchain will dominate all other toolchains.

Unlike the “code is going to cease to exist” theme, this one is actually happening RIGHT NOW. We already see outlandish claims from folks like AWS and MSFT:

https://techcrunch.com/2025/04/29/microsoft-ceo-says-up-to-30-of-the-companys-code-was-written-by-ai/

https://www.cnbctv18.com/technology/amazon-ceo-andy-jassy-says-gen-ai-saved-260-million-and-4500-developer-years-19465522.htm

Of course, AWS and MSFT are both trying to sell you things so these claims may not be 100% accurate, but if you spend any time talking to developers, you will find that AI toolchain is VERY real. CoPilot for example has massive adoption. Almost every developer I talk to is using some form of AI toolchain.

Hence my project to look at these tools. They matter.

Replit one-shot prompt test

I’m way beyond the part of my career where I attempt to pick winners. I was the guy who said that Facebook was CB radio. So, this review isn’t a recommendation or a prediction of success. It’s just a report on the tool, what I saw, and how it works.

The first category of things I’m going to review are things I’ll call “Rapid Prototypers.” Note that this is NOT how these folks refer to themselves. And they may have additional functionality beyond prototyping. However, in my investigation, this is what they’re really good at. As a PM, it is very interesting and useful to me that I can interact with an AI and create a working, clickable prototype. The resulting code may be production-ready or may not be production-ready. I’m not an engineer anymore so I can’t and won’t evaluate that.

So, the first thing I do for each one is give them a “one-shot” prompt. Can they produce a working application from a simple prompt?

To test this, I gave Replit a very simple prompt: “Build an app that stack ranks issues in a GitHub repo based on user-defined criteria.” After a few clarifying prompts, it created this application:

Replit passes this test with relative ease. I had a weird permissions issue where the app refused to run the first time (not unusual for AIs, BTW) but after just a few debug prompts, I got a working website. Note that this is an insanely brief prompt. Normally I would write at least one page describing the app, but the test is designed to see how well the tool does with minimal prompting. This is intentionally the “worst case” for the tool. Naturally, the result would be better with a more detailed prompt.

Of course, my test produced just a UX shell. None of the functionality is there; it doesn’t actually do anything. But I was able to get a running UX. So, nice job.

Unlike some others, Replit automatically prompted me to set up a database backend. That was a nice touch and allowed me to make an app that could actually work. However, it turns out that the tool is still a bit delusional in that it expects to make an API call without an API token and doesn’t actually check to see if the API call works as you can see here:

Notice that it declares victory without checking. The very first user action fails but the system doesn’t know this because there is no unit testing at all.

Again, this isn’t unusual for AI toolchain. We call this “intern syndrome.” LLMs are basically the most enthusiastic interns you have ever hired. Like puppies, eager to please but not really experienced enough to understand the mountain of shit they are about to dive into.

Feature request: I would VERY, VERY much like to see these tools use existing tools like OAD. Here’s a description:

https://en.wikipedia.org/wiki/OpenAPI_Specification

I worked with some of the folks behind Swagger back in the day and this is EXACTLY why Swagger was created in the first place. I would expect tools like Replit could benefit from this also. Today, it’s really common for AI tools to choke on external APIs.

So, when you actually try to run the report, it fails as shown in this screenshot:

Keep in mind that this is a very well known API. I’m assuming that a custom API for my personal doohicky would be worse?

At any rate, it’s not really ready to build actual applications assuming that you need to call APIs as part of your application. I’m sure I could make it work given enough time, but the amount of effort involved caused me to stop and shift to another tool like Cursor which has a larger context window.

TL;DR: Replit was able to build a nice prototype with minimal prompting, but turning that into a running app requires significant debugging.

Thanks for reading, let me know if you think these kinds of mini reviews are interesting and I’ll do more of them.

Perspectives from the Field