How AI Agents Actually Work: The ReAct Pattern

Agents have been around for decades. Your robot vacuum is an agent. So are the enemies in video games, the autopilot in aircraft, and the systems that manage traffic lights. These machines perceive their environment, decide what to do, and take action. Over and over, in a loop.

This loop of perceive, reason, and act is one of the oldest ideas in artificial intelligence. It's simple, and it works. But for most of AI's history, the "reason" part had to be programmed by hand. Language models changed that.

The Classical Pattern

A robot vacuum runs a tight loop. Its sensors detect obstacles, dirt, battery level. Its logic decides whether to keep cleaning, reroute around furniture, or return to the dock. Then it moves, turns, or starts charging. Perceive, reason, act.

Classical Agent Loop

Cycle 1 of 3

Perceive

Reason

Act

{
  "obstacleAhead": false,
  "obstacleLeft": false,
  "obstacleRight": false,
  "battery": "85%",
  "dirtDetected": true,
  "dockInRange": false
}

Watch this vacuum work. Obstacle ahead, battery at 45%. Reroute left to continue cleaning. Turn left 45°, move forward. Perceive, reason, act.

This pattern has powered autonomous systems for years. The elegance comes from separating concerns. Sensing is one thing, decision-making is another, action is another. Each piece can be improved independently.

But look at that middle step, the reasoning. The vacuum doesn't actually think about whether to reroute left or right. It follows rules that engineers wrote in advance. If obstacle ahead and left is clear, turn left. If battery below 20% and dock in range, go home. Every decision the vacuum can make was anticipated and coded.

Classical agents were limited in this way. They could only handle situations their creators imagined.

The Reasoning Bottleneck

For simple environments, hand-coded rules work fine. A vacuum cleaner doesn't face many novel situations. But scale up the complexity and the approach breaks down. Consider a customer service agent handling complaints, a research assistant synthesizing documents, a coding assistant debugging unfamiliar errors. The situations multiply faster than any team can anticipate.

You can't write rules for every possible customer complaint. You can't anticipate every research question. The number of situations explodes, and no team of engineers can map them all to actions in advance. Reasoning was the bottleneck, not perceiving or acting.

A General-Purpose Reasoner

Language models changed this. An LLM can read a customer complaint it's never seen before and figure out what to do. It can understand novel research questions. It can reason about code it wasn't trained on. Suddenly, that reasoning step doesn't have to be pre-programmed. You can describe what you want in natural language, and the model figures out how to get there.

This is what makes modern AI agents different from their predecessors. They have the same basic loop (perceive, reason, act), but the reasoning is now done by a general-purpose language model instead of hand-coded rules. The agent has gained something classical agents never had. It can reason about situations its creators never anticipated.

The Black Box Problem

There's a trade-off, though. When a robot vacuum decides to turn left, you can trace exactly why. The left sensor returned "clear," the right sensor returned "obstacle," and line 247 of the firmware selected the turn-left action.

When an LLM decides to turn left, you can't see why. The reasoning happens inside the model's parameters, billions of numbers that no human can interpret. You get an input, you get an output, and in between is a black box.

This matters. When an LLM-powered agent makes a surprising decision, you need to understand why. Was the input misunderstood? Did it reason correctly but have wrong information? Is there a flaw in its thinking? Without visibility into the reasoning, debugging is guesswork.

Invisible reasoning also makes it hard to trust agents with important tasks. If you can't see how decisions are made, you can't verify they're being made well. You can't catch errors before they cause harm. You can't learn from mistakes to improve the system. We've gained powerful reasoning capabilities at the cost of transparency.

Thinking Out Loud

The solution is surprisingly simple. Make the model show its work.

Instead of just producing an output, ask the model to first articulate its reasoning. What does it understand about the situation? What options is it considering? Why does it prefer one action over another? Force the thinking to happen in text, where you can see it.

This is the core insight of the ReAct pattern (Reasoning + Acting). Before the model takes any action, it generates a thought, an explicit statement of its reasoning. Then it acts. Then it observes the result of its action. And the cycle repeats.

ReAct Loop

Reason

Act

Observe

Reason. Act. Observe. The same basic loop as before, but with the reasoning made visible.

This isn't just about visibility, though. It changes how reasoning happens. Consider our robot vacuum again, deciding whether to clean a detected dirt spot:

Two Ways to Reason

Same input, same output—different process entirely

Classical Reasoning

dirtDetected=true?→YES

↓

obstacleAhead=false?→YES

↓

RULE MATCHED:Clean area

LLM Reasoning

“I've detected dirt in this area. Before I clean it, I should check if there's anything blocking my path.

Looking ahead, the way is clear—no obstacles detected.

I'll drive over and clean the area.”

DECISION:Clean area

The classical agent can only handle situations its programmers anticipated.
The LLM reasons about what makes sense given the context.

Both approaches arrive at the same action: clean the area. But look at the difference in how they get there.

The classical agent evaluates a chain of boolean conditions. Each rule was written by an engineer who anticipated this exact situation. If the conditions change even slightly, say the vacuum detects an unusual obstacle type that wasn't in the original rule set, the system has no way to reason about it.

The LLM thinks in language. It can consider context, weigh trade-offs, and handle situations that were never explicitly programmed. The reasoning is flexible because natural language is flexible.

And because this reasoning happens in text, you can see it. When an agent makes a surprising choice, you can read what it was thinking. You can see where the reasoning went wrong, or discover that it actually made sense given what it knew. You can trace chains of thought through multi-step problems. Making the model think out loud turns a black box into a glass box.

Seeing It Work

Let's watch a ReAct agent work through real tasks. Below you can step through two scenarios. A support ticket router making a single decision, and a trip planner working through multiple steps to build an itinerary.

🎫

Support Ticket Router

A frustrated customer needs urgent help with account access. Watch how the ReAct agent reasons explicitly before taking action.

Reason

Act

Observe

Iteration1of2

Reason

The agent analyzes the ticket and reasons about urgency, sentiment, and routing

Subject: Can't access my account!!!

I've been trying to log in for 2 hours now. I keep getting "invalid password" 
but I KNOW my password is correct. I have an important meeting in 30 minutes 
and I need to access my files. This is EXTREMELY frustrating. Please help ASAP!!!

Let me analyze this support ticket carefully:

1. **Sentiment Analysis**: The customer is clearly frustrated and angry. Multiple indicators:
   - Three exclamation marks in the subject line
   - ALL CAPS on "KNOW" and "EXTREMELY"
   - "ASAP" with multiple exclamation marks
   - Repeated emphasis on frustration

2. **Time Sensitivity**: CRITICAL - The customer has an important meeting in 30 minutes and needs access to files. This is a hard deadline.

3. **Business Impact**: HIGH - The customer cannot access their account and has a time-sensitive business need (meeting in 30 minutes).

4. **Core Issue**: This is an ACCOUNT problem - login/access issue. However, the urgency level elevates this significantly.

5. **Urgency Level**: CRITICAL - This meets multiple critical criteria:
   - Time-sensitive deadline (30 minutes)
   - Business impact (important meeting)
   - Extremely frustrated customer
   - 2 hours already spent trying to resolve

6. **Routing Decision**: While this is fundamentally an account issue, the critical urgency and business impact warrant routing to URGENT-ESCALATION to ensure immediate attention and priority handling. The customer needs help within minutes, not hours.

This should be routed to urgent-escalation with critical urgency level.

What to observe:

•See explicit 'thinking out loud' before each action
•Understand the Reason-Action-Observation pattern
•Compare ReAct's visible reasoning to PRA's internal reasoning

Notice how each iteration has three distinct phases. The thought shows the agent's reasoning, including what it understands, what it's considering, and what it decided. The action is what it does as a result. The observation is what comes back from the world.

In the support ticket example, one pass through the loop is enough. The agent reads the ticket, thinks about what it means, and routes it accordingly. Simple task, single iteration.

The trip planner is more interesting. Each action changes what happens next. After searching for activities, the agent sees results it didn't have before. That changes its next thought. After checking availability, it knows what's actually possible. The loop repeats, each iteration building on the last, until the itinerary is complete.

This adaptability, the ability to change behavior based on what you learn, is what makes agents useful for complex tasks. The visible thinking lets you follow along, verify the reasoning, and understand how decisions are made.

The Mechanics

What does this look like in code? The ReAct loop itself is straightforward.

react_loop.py

# The ReAct loop: think, act, observe
while not done:
  thought = agent.think(observation)      # "I should search for flights..."
  action = agent.act(thought)             # search_flights(destination="Barcelona")
  observation = agent.observe(action)     # [Flight 1: €89, Flight 2: €120, ...]
  done = agent.is_complete(observation)

The think step prompts the language model to reason about the current situation and decide what to do. The output is text with the model's thinking made visible and traceable.

The act step executes whatever the model decided. This could be calling a function, querying a database, sending a message. The model's decision becomes a real action in the world.

The observe step captures what happened as a result. This becomes input to the next iteration so the model can see outcomes and adjust accordingly.

The loop continues until some stopping condition is met. The goal is achieved, a limit is reached, or the agent decides it's done.

When to Stop

Knowing when to stop is trickier than it sounds. For simple tasks with clear success criteria, the agent can recognize when the goal is achieved. For open-ended tasks, it's less obvious.

Most implementations set a maximum number of iterations as a safety net. Without this, an agent that gets stuck will loop forever, burning through API costs. Timeout limits serve a similar purpose for real-time applications.

Some agents are designed to recognize when they're stuck. They notice when they've tried the same action multiple times without progress, or when they're uncertain about how to proceed. This kind of self-awareness can trigger a graceful exit or a request for help.

Termination isn't a solved problem. Too aggressive and you cut off agents before they complete complex tasks. Too lenient and you waste resources on agents that will never succeed. Finding the right balance depends on the specific use case, and we'll revisit this topic in detail in a later post.

Trade-offs

ReAct isn't free. Every thought is tokens, and tokens cost money. An agent that thinks carefully about each step will be more expensive to run than one that doesn't. For simple tasks where the answer is obvious, explicit reasoning is overhead.

The reasoning also adds latency. Generating a thought takes time, and for applications where speed matters, that time adds up. A support ticket router that thinks for two seconds before each decision might be too slow for high-volume use.

And while visible reasoning helps with debugging, it doesn't guarantee correct reasoning. Models can think out loud and still be wrong. They can produce plausible-sounding thoughts that lead to bad actions. The visibility helps you catch these problems, but it doesn't prevent them.

For some tasks, simpler patterns work better. A direct prompt-response without explicit reasoning is faster and cheaper. Rule-based routing can be more predictable for well-understood situations. ReAct shines when tasks are complex enough to benefit from step-by-step thinking, and when visibility into that thinking has value.

What We Covered

Agents are systems that perceive, reason, and act in a loop. This pattern is decades old and powers everything from robot vacuums to autonomous vehicles.

Classical agents were limited by hand-coded reasoning, requiring every decision to be anticipated and programmed in advance. Language models removed this limitation by providing the ability to reason about novel situations, giving agents general-purpose brains. But LLM reasoning is invisible, which creates problems for debugging and trust.

ReAct solves this by making the model think out loud. Explicit reasoning, captured as text, lets you see how decisions are made. The resulting reason-act-observe loop is conceptually simple but surprisingly powerful, enabling agents that can tackle complex, multi-step tasks while remaining transparent enough to understand and improve.

Coming up next: We have agents that can reason. But how do we actually communicate with the language model that powers them? Post 2 explores the mechanics of talking to LLMs, covering tokens, parameters, and the economics of inference.