Perspectives from the Field: What is an AI Agent?

As I discussed in a previous blog post, one of the major focus areas of my team at Outshift by Cisco is AGNTCY which is an open-source collective focused on Agentic AI and the future Internet of Agents. As part of this work, we often get into debates about things like the term “agentic” or even defining the term “agent.” Now that our work is hitting the mainstream, we are seeing this debate all around us.

For example, we have people telling us that agents absolutely cannot work:

Notice that the author complains about “agentic” without defining it and then talks about “AI pilots” which aren’t necessarily agentic at all. Not too surprising for something moving as quickly as AI, and specifically agentic, which is probably the buzziest of AI terms at the moment. Thus, you hear these terms on a daily basis. However, everyone has their own idea as to what these terms mean.

For example, NVIDIA describes agentic as follows:

The next frontier of artificial intelligence is agentic AI, which uses sophisticated reasoning and iterative planning to autonomously solve complex, multi-step problems. And it’s set to enhance productivity and operations across industries.

https://blogs.nvidia.com/blog/what-is-agentic-ai/

Which is interesting, but also very restrictive. What if the problem does not require iterative planning? Is it not an agent? What if we want the agent to check in with us to give a proposed solution? Agent?

In a similar vein, IBM says:

Furthermore, agentic AI takes autonomous capabilities to the next level by using a digital ecosystem of large language models (LLMs), machine learning (ML), and natural language processing (NLP) to perform autonomous tasks on behalf of the user or another system. A gen AI model that has garnered much attention is ChatGPT. While this product offers similar creative abilities to agentic AI, it isn’t the same.

https://www.ibm.com/think/topics/agentic-ai-vs-generative-ai

However, Microsoft has a much broader definition:

Notice that they include retrieval and action. This is much more generic and aligned to what I actually see happening with my customers. Everyone starts with retrieval. Then they move to recommendations. Only after they know the recommendation has high accuracy do they move on to automation.

So, as we think about “agentic” we must first think about “agents.”

On the one hand, an agent is simply software that takes action which is abstracted away from the implementation of that action. So, I ask an agent to do X but I don’t have to know how X is done, I just get the result back. Of course, I’ve just defined a microservice as well. Fundamentally, microservices are abstracted functions I can consume that do things without me having to worry about how they do them.

If you read the definitions above, you see several conflicting ideas but there is a strong theme around the “agents take actions without human supervision” thing. If we look at the history of automation, we know that automation systems always take time before they’re mature enough to take actions on their own. For this reason, I understand the skepticism that people are expressing online. If you define an agent as something that takes action independently, we clearly have work to do. If you define an agent as something that takes autonomous action, then we are clearly not ready for agents yet.

However, this definition is severely lacking and way too restrictive. If you only define an agent as something that takes action on your behalf, you’re missing out. Let’s say you had software that, given your stock portfolio, made recommendations about which stocks to buy. Or say it asked you a series of questions and then came up with a custom stock recommendation. I think that most would agree that it is an agent. But it doesn’t buy stock, so it doesn’t take the action. Still an agent though. It’s the fact that it does discrete work for you that matters, not the degree to which it takes action or not.

So, what is different between agents and microservices?

Well, one could say “not much” since they are mostly the same thing. Which is fair, but I think is also missing the point.

I would argue that the difference is subtle but profound. The difference is that microservices (and most APIs) accept highly structured data in and produce highly structured data out. An agent, however, accepts semi-structured (or even unstructured) data in and produces semi-structured data out.

For example, if we wanted to build a stock picker app as a microservice, we could certainly do that. Many apps do exactly that. You would build a microservice (or more likely a series of them) which took input like “Customer Risk Tolerance Score” or “Customer Total Portfolio Asset Classes” and then you would construct a recommendation app from there. On the other hand, an agent based solution would focus on a prompt: “Based on me being 55 years old and having a current portfolio heavily weighted to small cap stocks, what four index fund buying recommendations would you make?” This would be turned into a semi-structured call to the agent. The actual details of this vary, but these days this is usually also an API call with some sort of semi-structured format like JSON.

And this is what’s interesting about things like A2A from Google or ACP from AGNTCY (which are both JSON). They recognize that these systems are focused on these semi-structured or unstructured data calls and thus they focus on the semantic layer of the application. Under the covers, they’re just REST sitting on HTTPS with encapsulated JSON just like we have been doing for ages. The payload of the JSON can be unstructured (i.e. the prompt in the example above) or it can be semi-structured (i.e. a list of stocks I should buy in priority order). But you normally don’t give the agent a highly structured set of inputs (i.e. a long list of value key pairs) which is what is usually in the JSON or XML passed in an REST API call. It’s fascinating to me that the dominance of REST over older call types like SOAP has allowed us to have a very strong infrastructure for passing text all over the internet. All of the “important” features we used to have in the 1990’s like strong types, bounds checking, etc. are no longer needed in the Internet of Agents era. It’s as if we have been working towards unstructured for twenty years and we all suddenly understand WHY we’ve been doing that.

So, if we agree that agents are focused on semi-structured or unstructured data, then what is agentic software?

I would argue that agentic software is any system composed of agents. That is to say, that agentic architecture is fundamentally about composing subsystems without the traditional highly-structured internal architecture that things like microservices use. This is VERY different from what we did in the past. It creates all kinds of interesting new possibilities, but it also causes all kinds of problems like the lack of a clear test plan and inconsistent result sets. The security implications of all this are completely off the charts and there will likely be an entire industry securing these agents over time.

That leaves us with the following definition:

An agent is a subsystem that can take unstructured or semi-structured inputs and perform work on behalf of the caller without exposing details of how that work is performed. Agents can be based on LLM’s or other GenAI technologies, but their actual implementation detail is irrelevant to this definition. Agents can retrieve information, propose actions or take actions on the caller’s behalf. Agents may or may not be autonomous in executing those actions.

This definition allows us to capture the essence of what is different about agents without committing ourselves to allowing agents to take completely autonomous actions. Autonomy may or may not happen, but agents are a very useful construct, similar to microservices, but more flexible.

Perspectives from the Field

Saturday, May 31, 2025

What is an AI Agent?

No comments:

Contributors

Blog Archive