Llm | MikesBlog

AI Cloud Interactive Hype Cycle 2025

Based on Gartner Hype Cycle for Cloud Platform Services, 2025 ...

Model Context Protocol (MCP) Best Practices

As we integrate services and data APIs into agentic AI solutions, interest is growing in how the Model Context Protocol (MCP) can standardize the way tools expose their capabilities to agents. With that in mind, I’ve assembled—yes, with the help of AI—a survey of key topics and resources related to MCP. MCP is an open standard (launched by Anthropic in Nov 2024) for exposing data sources, tools, and “resources” to AI agents via a uniform interface. It is designed to replace the ad-hoc “one-off connector per tool/agent” pattern, simplifying how LLM-based agents integrate with live systems. [1] ...

A2A Doesn't need AI Agents

Is this just distributed computing re-packaged for AI? What is A2A Without AI Agents? I got into a debate with Gemini recently about Agent-To-Agent protocol (A2A). I said I thought it was a retread of existing distributed computing technologies like Service Discovery, Mesh, CORBA, etc. Perhaps Gemini took it personally, as Google (Gemini’s Creator) had announced A2A in April, and Gemini got a little “gushy” on how it was “a revolutionary new idea.” Also, perhaps “debate” is too strong a word. And I might want to consider getting out more often. ...

Building a Hybrid Summary Evaluation Framework

Combining deterministic NLP with LLM-as-Judge for robust evaluation Summary Evaluation Challenges Summary evaluation metrics sometimes fall short in capturing the qualities most relevant to assessing summary quality. Traditional machine learning for natural language processing (NLP) has covered a lot of ground in this area. Widely used measures such as ROUGE focus on surface-level token overlap and n-gram matches. While effective for evaluating lexical similarity, these approaches offer limited insight into aspects such as factual accuracy or semantic completeness [1]. ...

LLMs At The Command Line - Part 1

If you are a command-line fan and want to experiment with large language models (LLM), you will love AiChat. There are many popular graphical front ends for working with LLMs, such as OpenAI’s ChatGPT, and Anthropic’s Claude, but get ready for this little powerhouse for CLI lovers as it has many advanced and useful features. One such feature is an easy-to-use, out-of-the-box RAG feature (Retrieval Augmented Generation) useful for searching existing content. I’ve put together a small demo here that shows how easy it can be to use in a pinch. There are many use cases where such an approach is just the right size. ...

Classify With Confidence

Large foundation models like GPT can classify text according to a well-crafted prompt instruction, and it’s remarkable how well they can do this, considering there has been no explicit training with labeled datasets. This has traditionally been done using machine learning models and logistic regression techniques. However, with generative model classification, we lose the ‘confidence level’ or the probability score of the prediction available in logistic regression. Traditional models like logistic regression provide a probability score for each class, indicating the model’s confidence level in its predictions. This confidence score is not just valuable; it’s essential for decision-making, as it helps users gauge how confident the model is about its classifications. While generative model responses may align well with the intended classification, we don’t directly get an explicit probability for each class. This can be a limitation, particularly in high-stakes applications where knowing the model’s confidence level is crucial. ...