Vibecoding an Agentic Coder - Part 2
In this segment, I’ll generate many candidate applications using my experimental framework, CodeAgents, choosing from a set of models: GPT-4.1, Claude 3.7, and GPT-4o. Then, I’ll compare and contrast the solutions. Along the way, I’ll present some ideas and tips on improving AI-generated code in ways that generally translate to other tools and frameworks. It isn’t easy to score how good an AI-coded solution is. Of the possible metrics, code complexity might not be as meaningful as long as the AI understands the code, as would “maintainability,” as that’s based on human limitations; the AI can refactor on the fly. Test coverage is a good metric as it measures how well the AI-generated test suite covers the code. ...
Vibecoding an Agentic Coder - Part 1
I’ve tried Cursor, Replit, Lovable, and Bolt with varying degrees of success and found recurring themes in the use of these tools that require “vibing” until you arrive at a finished, hopefully working, result. Whether the result is good can sometimes be in the eye of the beholder. I’ve also become fascinated by how these tools will change the way programmers think about code and its organization — how many rules will be thrown completely out the window and how, oddly, the new rules will harken back to the early days of programming before Google and the Internet. ...
LLMs At The Command Line - Part 1
If you are a command-line fan and want to experiment with large language models (LLM), you will love AiChat. There are many popular graphical front ends for working with LLMs, such as OpenAI’s ChatGPT, and Anthropic’s Claude, but get ready for this little powerhouse for CLI lovers as it has many advanced and useful features. One such feature is an easy-to-use, out-of-the-box RAG feature (Retrieval Augmented Generation) useful for searching existing content. I’ve put together a small demo here that shows how easy it can be to use in a pinch. There are many use cases where such an approach is just the right size. ...
Experimenting with Agentic AI Tooling: My Journey Through the Cutting Edge
The first time I fired up an MCP (Model Context Protocol) server plugin, “Agent,” I was excited to see it registered in Claude Desktop but immediately annoyed by the errors that popped up. I didn’t expect a smooth experience in my encounter with the future of Agentic AI, but I found many configuration tweaks, clunky debugging tools, and broken dependencies along the way. It was a stark reminder that we’re in the early days, and there’s a lot of ground to cover before Agents become seamless collaborators. ...
Navigating the Fragmented Landscape of Agentic AI Tools
Agentic AI, with its promise of creating systems capable of autonomous reasoning and action, has been a hotbed of innovation in the AI community. Tools from OpenAI, LangChain, and Microsoft are spearheading this new wave, each offering unique features and capabilities. However, the lack of standardization in this ecosystem presents significant challenges to developers, researchers, and organizations eager to adopt these technologies. The Current State of Agentic AI Tools The diversity of agentic AI tools is both a strength and a weakness. On one hand, it fosters creativity and innovation as developers explore various approaches to building autonomous systems. On the other hand, the fragmented landscape leads to: ...
Classify With Confidence
Large foundation models like GPT can classify text according to a well-crafted prompt instruction, and it’s remarkable how well they can do this, considering there has been no explicit training with labeled datasets. This has traditionally been done using machine learning models and logistic regression techniques. However, with generative model classification, we lose the ‘confidence level’ or the probability score of the prediction available in logistic regression. Traditional models like logistic regression provide a probability score for each class, indicating the model’s confidence level in its predictions. This confidence score is not just valuable; it’s essential for decision-making, as it helps users gauge how confident the model is about its classifications. While generative model responses may align well with the intended classification, we don’t directly get an explicit probability for each class. This can be a limitation, particularly in high-stakes applications where knowing the model’s confidence level is crucial. ...