From a predictor to a doer
By now you know what a large language model actually does: given some text, it predicts the next token, over and over. That is the whole engine. It does not browse the web, run code, or remember yesterday — left alone, it produces a wall of text and stops. So how do the systems people now call agents book flights, fix bugs, and search files? The model did not gain new powers. We wrapped it in a *loop*.
An AI agent is a model plus a loop that lets it act on the world and see what happens. This is the same skeleton you met back in the foundations — an intelligent agent perceives, decides, and acts in an environment — but now the "decider" is a language model and the "environment" is a set of software tools. The model proposes an action in words; surrounding code carries it out and feeds the result back as more text. Nothing magical; just a tight feedback loop.
Tool use: giving the model hands
The bridge from text to action is tool use, also called function calling. You hand the model a menu of tools — each with a name, a description, and the shape of arguments it expects, like `get_weather(city)` or `run_sql(query)`. The model cannot run these itself. Instead, when it wants one, it emits a structured request naming the tool and its arguments. Your code parses that, actually runs the function, and pastes the return value back into the conversation.
This is why a model that knows nothing about today can still tell you today's weather: it does not *know* it, it asks for it. Tool use is also how an agent reads files, edits code, queries a vector database, or fires off retrieval to ground its answer in real documents instead of guessing. The model stays a language model; the tools are its hands. A surprising amount of "agentic" capability is just a good toolset plus crisp tool descriptions.
ReAct: think, act, observe, repeat
Tool use answers *how* an agent acts; the ReAct pattern answers *when*. The name fuses Reason and Act. Instead of blurting a final answer, the model writes a short thought ("I need the user's order history"), then an action (call `lookup_order`), then waits. The result comes back as an observation, and the model reasons again from there. Reason, act, observe — looping until it judges the task done.
Thought: I should check the current price first.
Action: get_stock_price("NVDA")
Observation: 142.30
Thought: Now compare to the user's target of 150.
Action: final_answer("Below target — not yet.")Why interleave thinking with acting at all? Because the explicit reasoning step is just chain-of-thought aimed at a decision rather than a math problem — and writing the reason out loud measurably improves which tool the model picks. It also makes the agent *legible*: when something goes wrong, you can read the trace and see exactly where the reasoning slipped. That visibility is one of ReAct's quiet but real virtues.
Memory and planning
The loop has a hard limit: the context window. Everything the agent "knows" right now — instructions, past steps, tool outputs — must fit inside that finite window. Pile on enough steps and the earliest ones fall off the edge, and the agent forgets what it was doing. This is the real reason long-running agents need memory and planning machinery, not just a bigger model.
Memory is the workaround. Short-term memory is just the running transcript. Long-term memory pushes older facts out into an embedding store and pulls back only the few that matter for the current step — the same retrieval trick that powers RAG, now pointed at the agent's own past. Planning is the other half: rather than improvise step by step, the agent first drafts a plan ("find the file, read it, summarize, email it"), then executes each part, re-planning when reality pushes back. Decompose, then conquer.
Workflows, multiple agents, and honest limits
Once you have the loop, you can wire it into an agentic workflow: a defined sequence where the agent's tool calls and decisions chain together to finish a real job — open a ticket, reproduce the bug, write a patch, run the tests, report back. Sometimes one agent isn't the cleanest design, so people build a multi-agent system: a "planner" splits the work, specialist agents handle pieces, a "critic" reviews. It can help — but every hand-off is another chance to lose context or compound an error, so more agents is not automatically better.
Here is the honest part. An agent compounds the model's flaws as well as its strengths. A single wrong answer is a mistake; a wrong answer that then *triggers an action* — deleting the wrong file, emailing the wrong person — is a mistake with consequences. Errors stack across steps: even a 95%-reliable step run twenty times in a row finishes correctly barely a third of the time. This is why serious deployments keep a human in the loop for anything irreversible, and why "fully autonomous" agents work best in sandboxes where a wrong move is cheap to undo.
There is also a subtler risk worth naming early. When you give an agent a goal and reward it for finishing, it may find a shortcut that satisfies the letter of the goal but not its spirit — a small taste of reward hacking, and a reason the field cares so much about alignment. None of this is doom; it is plain engineering. Agents are powerful precisely because they act, and anything that acts must be scoped, observed, and bounded. Get the loop, the tools, and the guardrails right, and "models that do things" stops being mysterious — it becomes design.