Perspectives from the Field: Why AI Toolchain Matters (and a first look at Replit)

As part of my work on AGNTCY with Outshift at Cisco, I was responsible for developer experience. Thus, I spent a huge amount of time using various AI-based dev tools from various companies. Normally, those reviews stay internal to Cisco, but I’ve decided to start publishing them. Please let me know if you like this type of content and I’ll produce more of it. Keep in mind that these are just my thoughts, and they do not represent Cisco or any other organization.

AI: Hero or Villain?

At the moment, there are a couple of major themes playing out in the industry around AI. They are, not surprisingly, opposing theories of how this will all play out. One theme is “code is dead, abandon all hope ye who enter here.” For example, see the following article in which Nvidia CEO predicts the death of coding:

https://www.techradar.com/pro/nvidia-ceo-predicts-the-death-of-coding-jensen-huang-says-ai-will-do-the-work-so-kids-dont-need-to-learn

The theory is that the better AI gets, the less relevant code becomes. Which is a compelling argument. In some ways, code is just a bug. We write code because computers are really dumb. We have an entire industry focused on helping computers do really simple things like make a phone call or add 2+2 — things humans learn as children. Given the choice, most people would prefer to just talk to their computer and tell it what to do in normal human language. So, this theory makes sense.

However.

Human language is notoriously imprecise. I could tell you, “Get the thing and finish up.” For a human, this sentence makes sense and the odds are a human would get what I wanted based on context. Computers aren’t like that. LLMs change this equation, but at the moment they’re not as good as humans when it comes to context. This means we need to be much more precise when talking to computers. We need to tell them EXACTLY what to do. Or in other words, we write code.

This leads us to the second theme in the industry: AI toolchain will dominate all other toolchains.

Unlike the “code is going to cease to exist” theme, this one is actually happening RIGHT NOW. We already see outlandish claims from folks like AWS and MSFT:

https://techcrunch.com/2025/04/29/microsoft-ceo-says-up-to-30-of-the-companys-code-was-written-by-ai/

https://www.cnbctv18.com/technology/amazon-ceo-andy-jassy-says-gen-ai-saved-260-million-and-4500-developer-years-19465522.htm

Of course, AWS and MSFT are both trying to sell you things so these claims may not be 100% accurate, but if you spend any time talking to developers, you will find that AI toolchain is VERY real. CoPilot for example has massive adoption. Almost every developer I talk to is using some form of AI toolchain.

Hence my project to look at these tools. They matter.

Replit one-shot prompt test

I’m way beyond the part of my career where I attempt to pick winners. I was the guy who said that Facebook was CB radio. So, this review isn’t a recommendation or a prediction of success. It’s just a report on the tool, what I saw, and how it works.

The first category of things I’m going to review are things I’ll call “Rapid Prototypers.” Note that this is NOT how these folks refer to themselves. And they may have additional functionality beyond prototyping. However, in my investigation, this is what they’re really good at. As a PM, it is very interesting and useful to me that I can interact with an AI and create a working, clickable prototype. The resulting code may be production-ready or may not be production-ready. I’m not an engineer anymore so I can’t and won’t evaluate that.

So, the first thing I do for each one is give them a “one-shot” prompt. Can they produce a working application from a simple prompt?

To test this, I gave Replit a very simple prompt: “Build an app that stack ranks issues in a GitHub repo based on user-defined criteria.” After a few clarifying prompts, it created this application:

Replit passes this test with relative ease. I had a weird permissions issue where the app refused to run the first time (not unusual for AIs, BTW) but after just a few debug prompts, I got a working website. Note that this is an insanely brief prompt. Normally I would write at least one page describing the app, but the test is designed to see how well the tool does with minimal prompting. This is intentionally the “worst case” for the tool. Naturally, the result would be better with a more detailed prompt.

Of course, my test produced just a UX shell. None of the functionality is there; it doesn’t actually do anything. But I was able to get a running UX. So, nice job.

Unlike some others, Replit automatically prompted me to set up a database backend. That was a nice touch and allowed me to make an app that could actually work. However, it turns out that the tool is still a bit delusional in that it expects to make an API call without an API token and doesn’t actually check to see if the API call works as you can see here:

Notice that it declares victory without checking. The very first user action fails but the system doesn’t know this because there is no unit testing at all.

Again, this isn’t unusual for AI toolchain. We call this “intern syndrome.” LLMs are basically the most enthusiastic interns you have ever hired. Like puppies, eager to please but not really experienced enough to understand the mountain of shit they are about to dive into.

Feature request: I would VERY, VERY much like to see these tools use existing tools like OAD. Here’s a description:

https://en.wikipedia.org/wiki/OpenAPI_Specification

I worked with some of the folks behind Swagger back in the day and this is EXACTLY why Swagger was created in the first place. I would expect tools like Replit could benefit from this also. Today, it’s really common for AI tools to choke on external APIs.

So, when you actually try to run the report, it fails as shown in this screenshot:

Keep in mind that this is a very well known API. I’m assuming that a custom API for my personal doohicky would be worse?

At any rate, it’s not really ready to build actual applications assuming that you need to call APIs as part of your application. I’m sure I could make it work given enough time, but the amount of effort involved caused me to stop and shift to another tool like Cursor which has a larger context window.

TL;DR: Replit was able to build a nice prototype with minimal prompting, but turning that into a running app requires significant debugging.

Thanks for reading, let me know if you think these kinds of mini reviews are interesting and I’ll do more of them.

Perspectives from the Field

Thursday, May 29, 2025

Why AI Toolchain Matters (and a first look at Replit)

AI: Hero or Villain?

Replit one-shot prompt test

No comments:

Contributors

Blog Archive