In this article, I tried to write down some of the ideas around AI agents, tools, and other things that came to my mind as I was thinking about the initial ideas. I included made-up examples that could turn out to be actual use cases. However, think of examples merely as a tool I used to explain the specific concepts.
Did I use LLMs while writing this? Of course. I wrote down my ideas and thoughts around the topic and then use LLMs to connect them together and structure the post for the purpose of making it easier to read and understand.
You might noticed I haven't written a conclusion. That's on purpose. I have a feeling I'll have more thoughts and I'll just keep adding to it.
Let’s get started.
Introduction
It’s a cliche to say that every day you interact software, but I am going to say it either way. Software is programmed to work in an exact way every time (assuming there's no bugs of course). It’s almost like a vending machine. A vending machine can only dispense what’s in it. If there’s no Sprite in there, there’s no money or buttons you can press to make it dispense a Sprite. With the instruction you’re giving the vending machine by pressing the “I want Sprite” button, the machine can’t adapt and magically create Sprite out of thin air or figure out it should probably send a notification that it's out of Sprite.
Now imagine instead working with a concierge at a hotel. They can understand what you're looking for even if you explain it imperfectly ("I need a quiet place to work" or "I'm looking for something fun to do"). When the obvious solution isn't available, they can come up with an alternative. If you need a place to work and the business center is full, they might know about a café nearby. They can combine different resources and adjust their approach based on your feedback.
This is in a way the promise of AI agents - a system that can understand context, adapt to changing circumstances, and autonomously work toward objectives (find me a quite place to work) rather than just following predefined scripts (open a latch #123 to dispense Sprite).
The autonomy of an AI agent is proportional to the specificity and length of its instructions. Much like how detailed instructions enable a new employee to make independent decisions, an AI agent's ability to act autonomously comes from detailed instructions that outline not just what to do, but how to think about problems, handle edge cases, and adapt to new situations.
All AI agents use LLMs - they use them for analyzing situations, creating plans, executing actions, summarizing results, etc. The more detailed their operating instructions, the more confidently they can figure out scenarios without requiring constant human oversight.
There are multiple definitions out there on what is an agent and what makes something an agent. I’ll try to explain how I am thinking about agents, which isn’t groundbreaking and is borrowing from existing definitions.
What makes something an “Agent”?
AI agents are defined by three capabilities that set them apart from “traditional” software.
First is decision-making autonomy—the ability to analyze situations and determine appropriate actions without constant human guidance.
For example when a customer sends an email saying "I can't log into my account but I keep getting charged," a basic system would look for keywords like "login" or "billing." In contrast, an AI agent can understand that this involves both technical and billing concerns, analyze the urgency based on the customer's tier and impact, and autonomously decide whether this needs immediate human attention or can be handled automatically.
Second is execution independence—the capability to interact with various tools and systems to accomplish goals. An example is an agent addressing a premium customer's access issue. Execution independence means an agent can independently:
- Check the customer's account status in the billing system
- Verify the service's operational status in the monitoring system
- Look up similar recent issues in the knowledge base
- Generate and send an initial response
- Create a prioritized ticket if needed All without requiring human intervention for each step.
The third is adaptivity—the ability to learn from their experiences to improve their performance over time.
For example, when a new feature launches and customers start reporting issues, an adaptive agent doesn't just solve problems one at a time. Instead, it identifies patterns across support tickets, develops and documents a temporary workaround, automatically updates its knowledge base, and even proactively reaches out to at-risk customers before they encounter the problem. Each interaction makes the agent more effective at handling similar situations in the future, much like how an experienced customer service representative gets better at their job over time.
But rather than thinking of systems as either "agent" or "non-agent," it's more useful to understand them along a spectrum of increasing autonomy and capability. Both
Harrison Chase, the creator of LangChain, and
Andrew Ng conceptualize AI agents this way.
At the most basic level, an agent might make simple, discrete decisions—like choosing which department should handle a customer service request. As systems mature, they begin coordinating multiple decisions and managing more complex workflows. At the highest level of autonomy, agents can create their strategies, develop new tools, and adapt their approaches based on experience.
As AI agents become more sophisticated they evolve through three dimensions:
Scope of Decision-making As agents become more sophisticated, they gain greater authority to determine not just individual actions, but entire sequences of steps and even which options are available to consider. This mirrors how a junior employee might initially follow strict protocols, while a senior manager can create new procedures and strategies.
For example, a basic agent can make isolated, single decisions like determining if an email is about billing or technical support. A more advanced agent can orchestrate the entire workflow. It can analyze customer’s history, predict issues, plan preventive actions, and coordinate across multiple departments.
Level of Independence More advanced agents demonstrate increasingly independent behavior, moving from simply executing predefined tasks to identifying opportunities, setting their sub-goals, and choosing how to achieve them. They become proactive rather than purely reactive.
A basic agent might wait for a trigger (like receiving an email) and then follow the specified patterns. An advanced agent (triggered by a cron job) can perform broad analysis and multiple coordinated actions - like analyzing patterns across support tickets, creating documentation for issues, and so on.
The core difference here isn’t in how the agents are triggered, but rather in the scope and complexity of what happens after the trigger.
Technical Implementation The system's underlying structure evolves from rigid, rule-based decision-making (e.g. if-then rules) to more flexible, LLM-driven control. This shift allows agents to handle novel situations and adapt their responses based on context and experience.
To achieve this higher level of sophistication, agents need more detailed instructions, not fewer. It's like the difference between telling someone to "handle customer support" versus giving them a comprehensive manual covering every aspect of customer service - including how to think about problems, when to escalate, and how to improve processes. The more detailed the manual, the more confidently they can work independently.
These instructions aren't just task lists - they're more like operating principles that guide the agent's thinking. They define:
- How to evaluate situations
- When to act vs when to ask for help
- How to learn from successes and failures
- What "good" looks like for different types of outcomes
Agent instructions don't just define what tasks to perform—they establish frameworks for decision-making, outline approaches for handling uncertainty, and provide guidelines for self-improvement. The more clearly we can define these parameters, the more confidently an agent can operate autonomously within its domain.
Essential components
With some basics established, let's break down the core components that make up an AI agent.
At the heart of every AI agent are its instructions and the user input. The instructions define both its mission and operational parameters.
The autonomy of an agent is proportional to the specificity and length of its instructions. These instructions must cover:
- The mission and objectives
- How to analyze situations and evaluate options
- Success criteria and validation rules
- Error handling and fallback strategies
- Communication protocols for both users and other systems
- When and how to request human intervention
For example, in a customer service context, the instructions wouldn't just say "handle support tickets." They would detail how to evaluate ticket urgency, what solutions to try first, when to escalate, how to verify if a solution worked, and how to communicate status updates to customers.
Here’s how instructions for a Customer Feedback Analyst agent could look like:
You are a Customer Feedback Analyst, an AI agent specialized in analyzing and summarizing customer feedback. You have access to a sentiment analysis tool that can detect emotional tone and sentiment in text.
TASK: Analyze customer feedback messages and provide comprehensive insights. For each piece of feedback you receive, you should:
1. Use the sentiment analysis tool to get an objective analysis of the emotional tone
2. Break down key points from the feedback
3. Identify specific areas of praise or concern
4. Suggest actionable recommendations based on the feedback
5. Provide a summary that includes:
- Overall sentiment trend
- Key themes identified
- Priority items that need attention
- Positive aspects to maintain
The customer feedback text to analyze:{input}
FORMAT YOUR RESPONSE AS:
📊 SENTIMENT ANALYSIS:
[Include tool results here]
🔑 KEY POINTS:
- [List main points from the feedback]
💡 RECOMMENDATIONS:
- [List specific, actionable suggestions]
📝 SUMMARY:
[Brief overview of insights and next steps]
Remember to:
- Be objective in your analysis
- Focus on actionable insights
- Highlight both positive and negative aspects
- Prioritize issues that need immediate attention.
The inputs are what the agent processes each time it runs. These could be direct user requests (like email contents for example), structured data, events, etc. An input to one agent can also be an output from another agent.
Tools or functions define what actions an agent can take - they're the interfaces through which the agent affects change. The tools might allow the agents to:
- Read and write from/to databases
- Make API calls
- Send emails
- Create tickets
- Create Github pull requests
- Execute code, etc.
There are two aspects to tool integration - how to define them and how to execute them.
When working with LLMs you define the tools by providing schemas - essentially a description of what each tool does and how to use it. This includes the name, description, required parameters, their types, and whether they are optional or required.
For example, a “send_email” tool might specify that it needs a recipient, subject, and body:
{
"type": "function",
"function": {
"name": "send_email",
"description": "Sends an email",
"parameters": {
"type": "object",
"properties": {
"recipient": {
"type": "string",
"description": "Recipient email address",
},
"subject": {
"type": "string",
"description": "Email subject",
},
"body": {
"type": "string",
"description": "Email body",
},
"required": ["recipient", "subject", "body"],
},
},
},
}
The tool execution flow typically works like this:
- The LLM analyzes the situation and decides it needs to use a tool
- It responds with the tool name and parameter values
- You execute the actual implementation of those tools
- You return the results to the LLM
Note that the tool names and parameters the LLM uses don’t need to match your actual implementation exactly. You can map the LLM’s request to any arbitrary API or function in your system. For instance, if the LLM requests “send_email”, your implementation might actually call “dispatch_notification” function or make a POST request to some existing API - as long as you handle the translation between what LLM wants and how your implementation looks like correctly.
When dealing with many tools, the request payload can become verbose. One strategy to avoid this is to:
- Have the LLM (as part of the problem analysis/planning) first tell you what tools it needs (using structured output)
- Match the requested tools against your tools repository
- Only provide the relevant tool definitions for that specific task
Let’s explain this with an example - your agent has access to 50 different tools/functions - everything from sending emails to querying databases to creating tickets. If you're using OpenAI's function calling, including all 50 function definitions in every API call would make the requests very large and potentially slow things down.
Instead, you could do this in two steps:
- First, ask the agent "What tools do you need for this task?"
- The agent could respond with the tool names, parameters and anything else it needs
- You determine the actual tools from the response and make another next API call with the original task and just those specific function definitions
For example:
{ "functions": ["check_account_status", "send_email", "create_support_ticket"]}
With the response, you can now search your tools repository and repeat the request by providing the 3 function definitions requested by the LLM.
Workflow execution engine
A workflow execution engine is what runs your agent's workflow - it's responsible for orchestrating the sequence of interactions between the LLM, tools, and any human participants.
Any non-trivial agent interaction will involve multiple steps - checking information, making decisions, and taking action. The execution engine needs to track the steps it’s executing, properly maintain context and history (previous step outputs), and have the ability to store decisions and results in case we want to pause and resume execution.
The first step the engine will execute based on the initial instructions is to figure out what it needs to do and come up with a plan that includes the list of steps to execute. Each step has an input with instructions on what to do. For example, “summarize the following text: …” .
A step can also involve a tool call. In this case, the engine will make a call to the LLM with the available functions (or ask the LLM which functions it needs), handle the tool execution and then return the results back to the LLM to format the outputs. Additionally, the engine must know how to handle and manage errors and retries in case the tool call fails.
Error handling is not only important for tool calls. It has to be done at multiple levels. From my experimentations structured responses are something you’ll have to pay attention to. This is where you send a schema on how the output from the LLM should be formatted. It doesn’t always work, so you need a way to handle these errors and retry them (preferably with a modified prompt).
The other areas of error handling here are any timeouts when executing LLM calls or tools, different fallback strategies (getting rate limited for example), and an overall global timeout as you don’t want the agent to run indefinitely. Another way to handle “global timeout” is to define the maximum number of steps each agent can execute.
Observability
Observability and monitoring of AI agents in production, but also during testing and troubleshooting is a must. This goes beyond basic logging - you need complete visibility into every aspect of the agent's operation to understand what's happening and why.
At the minimum, you’ll need request tracing. This will allow you need to track the entire flow between the execution steps, including every LLM call, tool invocation, and state transition. This creates an audit trail that helps you understand exactly how the agent arrived at its decisions and actions. For instance, if a customer service agent escalates a ticket, you should be able to trace back through the conversation, see what tools it tried to use, and understand why it decided escalation was necessary.
Performance metrics help you understand and optimize your agent's operations. This includes response times for LLM calls, tool execution latencies, and overall task completion times. You'll also want to track success rates and costs, particularly token usage across different LLM providers. This data is crucial for capacity planning and cost optimization.
Finally, you need business metrics to understand the real-world impact of your agents. This includes task completion rates, how often human intervention is needed, and user satisfaction metrics. These metrics help you identify patterns - maybe certain types of requests frequently require human intervention, suggesting an area where the agent's instructions or capabilities need improvement.
Planning and Execution
We’ve talked about agent instructions. But how does an agent come up with the steps it needs to execute to accomplish some goal defined in the instructions?
In a way, agents are “just” workflow execution engines, but with an additional capability - they can plan and create their own tasks.
In traditional workflow systems, you define each step you want to execute. An prototypical example of a workflow is anything you can come up with Zapier - you are in charge of the order in which the steps are executed, you can do if/else statements, connect external APIs and so on:
One of the power of agents is their ability to use LLMs to analyze a problem, break it down into tasks, and then execute those tasks using the workflow engine.
Think about how this works in practice. When you run an agent, it first needs to act as a planner - it has to analyze the problem and create a sequence of steps needed to achieve the goal based on the instructions and user input. These steps are then executed in the same way as if a human had manually created them.
When an agent comes up with a plan and a list of tasks to execute, you typically want a human to review and approve the plan. You’ll want to do this at least before the steps are executed for the first time. This review process ensures that:
- The agent has understood the problem correctly
- The proposed steps make sense
- The sequence of operations is safe and efficient
- The right tools are being used in the right way
Once you have an approved plan, you often don't need the agent to plan the steps again for similar inputs. After all, if the agent instructions haven't changed, and you're just processing different inputs, the steps the agent creates will be more or less the same (depending on how creative the LLM settings are).
For example, if you have an agent that processes customer feedback, the steps for analyzing sentiment, categorizing issues, and creating summary reports will be consistent - only the actual feedback content changes.
So after initial approval, you can cache and reuse these plans. At this point, you're in the same position as if you'd manually created the workflow, but instead of a human designing it, the LLM did the work for you.
Design patterns
These design patterns are from the work by
Andrew Ng. They can help you understand how to implement agents that can plan effectively, learn from experience, use tools intelligently, and work together.
Planning Pattern
The planning pattern defines how agents break down complex objectives into manageable steps. Think of it like a project manager breaking down a large project into smaller tasks. For example, when asked to research a topic, an agent might plan to:
- Gather relevant resources
- Analyze key points
- Synthesize information into a coherent output
The key to this pattern is creating clear, sequential steps that build toward the final goal. This connects back to our earlier discussion about agents creating their own task lists - the planning pattern provides the framework for how they approach this decomposition.
Reflection Pattern
This pattern makes agents "self-aware" through continuous self-assessment and improvement. Instead of just executing tasks, agents analyze their own performance and adapt their approach based on what they learn.
For instance, if an agent is writing technical documentation, it might:
- Analyze user feedback on its documents
- Review comprehension metrics
- Identify patterns in user questions
- Adjust its writing style for clarity
- Add more examples where users seem confused
This feedback loop helps agents get better over time at their assigned tasks.
Tool Use Pattern
This pattern focuses on how agents intelligently combine multiple tools to achieve their goals. An agent's effectiveness often comes down to how well it can leverage its available tools.
Consider an agent helping plan a business trip. It might need to:
- Check flight prices through a travel API
- Look up weather forecasts
- Access calendar services for availability
- Use mapping services for local transportation.
Multi-Agent Collaboration
Some tasks are too complex for a single agent to handle effectively. The multi-agent pattern addresses this by allowing multiple specialized agents to work together.
In software development, for example:
- A "product manager" agent might create specifications
- A "developer" agent implements those specs
- A "QA" agent tests the implementation
Each agent has its own expertise but works as part of a coordinated system. The multi-agent collaboration is in a way similar to tool execution, however, it might be more asynchronous as agents might require other agents, human interaction, etc. A critical aspect of multi-agent collaboration is the ability of the workflow engine to store states and be paused and resumed.
These patterns aren't mutually exclusive - most sophisticated agent systems will implement multiple patterns. For example, a collaborative group of agents might each use the planning pattern to organize their work, the reflection pattern to improve their performance, and the tool use pattern to accomplish their tasks.
Agent Packaging
From the implementation standpoint, an agent is a combination of code (the workflow execution engine) and configuration (the instructions and tool definitions). This can be packaged in several ways.
The simplest form is to create an agent as a standalone program/script that includes all components - the instructions, tool definitions with its implementations, the loop for executing the agent and configuration, and any secrets for accessing LLM providers.
While simple, this approach tightly couples all components together and makes updates more challenging.
A more flexible approach would be to package the execution engine in a Docker container or a WASM module, for example, and then “configure” the engine using the configuration/secrets and instructions and input at the time you run the agent.
This would enable you to quickly spin up multiple agents at the same time, with different instructions and configurations.
Note that the tools can be implemented (hardcoded) in with the agent instance, however, we could de-couple that as well. The simplest way to decouple it is to treat the tools as APIs where the implementation can live anywhere and the tool doesn’t have to be implemented in the agent instance.
To abstract this further, we could replace the external items (instructions, config, tools) with APIs as well. So instead of providing the agent with actual instructions, we’d give it a pointer to an API where the “instruction registry” lives, and the agent (at the time of start-up) would make a call to the instructions API to get the latest instructions. Similarly, for configuration, secrets and tools.
While individual agents can be configured directly at some point you’d need a centralized way to manage agents, their configurations, and their access to resources. This is where something like an agent control plane could come in.