Agency and agents
2,400 years ago, Aristotle observed something that broke with everything his teacher had taught. Socrates believed that no one does wrong voluntarily — that if you know what is good, you do good. Aristotle looked around and said: no. People know what is good and still choose otherwise. He gave it a name: akrasia (ἀκρασία) — weakness of will. And with that he inaugurated a question that humanity still hasn't resolved: if I know what I should do, why don't I do it?
Aristotle also gave us the word that structures this chapter: agency. In philosophy, an agent is a being with the capacity to act intentionally in the world — to perceive its environment, deliberate, and execute actions with purpose. For Aristotle, agency was inseparable from freedom: acting voluntarily, with knowledge of the circumstances and for one's own reasons, is what distinguishes us from objects that simply react to external forces.
The intention-action gap
Modern psychology put numbers to what Aristotle saw. Decades of research on the intention-action gap demonstrate that your conscious intentions — those decisions you make with clarity and conviction — only predict between 18% and 23% of what you actually end up doing (Armitage & Conner, 2001; McEachan et al., 2011, 2016). The other ~80% of your behavior comes from automatic habits, environmental cues, emotional states, accumulated fatigue, and social context. Forces that operate beneath your consciousness.
A 2023 meta-analysis put it in brutal perspective: the successful translation of a positive intention into actual behavior is nearly at chance level (Rhodes et al., 2023). Even when you get a person to want much more to do something, that massive increase in intention barely moves the needle on whether they actually do it (d = 0.15; Rhodes & Dickau, 2012).
Why? Because your brain is not a coherent unit. It's a committee where multiple systems compete: the automatic system (fast, emotional, reactive), the deliberate system (slow, rational, exhaustible), biochemistry, social context, and — crucially — the design of the environment. Your intention to save money competes against a delivery app designed to make you buy with one tap. Your intention to eat well competes against a supermarket that puts candy at the checkout. Your intention to organize your finances competes against bank statements designed so you don't understand how much you're paying in fees.
Janet Polivy and C. Peter Herman at the University of Toronto formalized this as the False Hope Syndrome (2000, 2002): a cycle where you decide to change, you feel good about deciding (dopamine comes with the decision, not the action), you fail in execution, you blame yourself, and eventually you decide to change again. People repeat the same New Year's resolution an average of 10 times in their lifetime.
The self-help industry — worth billions of dollars a year — sells the wrong solution: more information, more motivation, more willpower. A study from the University of Washington found that among people voluntarily enrolled in a program, only 20% read the complete book and just 2-4% followed through with all the suggestions. Reading about changing feels like changing. The brain doesn't distinguish well between planning an action and executing it.
The problem was never a lack of information. The problem is that human agency — that capacity to act with purpose — breaks down when the system it operates in is more complex than the mind can manage.
From thought to machine
Philosophy gave us the concept of agency. Computer science extrapolated it.
In 1956, a group of researchers gathered at Dartmouth College and coined the term artificial intelligence. It wasn't a technical achievement — it was a declaration of intent: to build machines that think. What followed were decades of prototypes that, seen today, look primitive but asked the right questions.
ELIZA (1964–1966), created by Joseph Weizenbaum at MIT, simulated a psychotherapist using pattern matching. It understood nothing — it simply returned the user's words reformulated as questions. But something unexpected happened: people who interacted with ELIZA attributed genuine understanding to it, confided real problems, became emotional. Weizenbaum was disturbed. They called it the ELIZA effect — the human tendency to project intelligence onto systems that merely respond.
ELIZA - 1966, MIT
> I feel tired all the time.
Why do you say you feel tired all the time?
> Because I feel like nothing I do is enough.
Does it concern you that nothing you do is enough?
> Yes, especially with my family.
Tell me more about your family.
> My mother always expects more from me.
Your mother expects more from you?
In the decades that followed, AI advanced in closed domains — playing chess, classifying images, recommending products — but kept hitting the same wall: natural language. Understanding what a person says, with its ambiguity, implicit context, and thousand ways of saying the same thing, was a problem no architecture could solve well. And without understanding language, a machine cannot receive open-ended instructions, reason about novel situations, or decide what to do when facing an unforeseen problem. It could execute rules — but it couldn't think about which rule to apply.
That changed with a single idea.
The Transformer revolution
On June 12, 2017, eight researchers from Google Brain and the University of Toronto published a paper with a title that read almost like a manifesto: "Attention Is All You Need" (Vaswani et al., 2017). Presented at NeurIPS 2017, the paper introduced the Transformer architecture — a model based entirely on attention mechanisms that eliminated the need for recurrent networks (RNNs) and convolutional networks (CNNs) for sequence processing.
The core idea was elegant: instead of processing a sentence word by word (as RNNs did), the Transformer looks at all words simultaneously and calculates the relevance of each one relative to the others. That operation — self-attention — enabled massively parallel model training and, for the first time, scaled language processing to previously unthinkable dimensions. The results were immediate. The Transformer architecture became the foundation of virtually every language model that followed.
| Model | Date | Milestone |
|---|---|---|
| GPT-1 | June 2018 | OpenAI demonstrates that a pre-trained Transformer can solve diverse tasks. 117 million parameters |
| ChatGPT | November 2022 | GPT-3.5 fine-tuned with human feedback (RLHF). 100 million users in two months. AI leaves the lab |
| LLaMA | February 2023 | Meta publishes open weights. Democratizes access and sparks an explosion of derivative models |
| Claude 3.5 Sonnet | June 2024 | Anthropic ships the model that changed code writing: 49% on SWE-bench, outperforming larger and more expensive models |
| DeepSeek-R1 | January 2025 | Open-source reasoning model (MIT license) trained for ~$6M. Matches OpenAI o1. NVIDIA loses $589B in a single day |
| Gemini 3 Pro | November 2025 | Google launches its most powerful model at the time: 100% on AIME 2025, 76% on SWE-bench |
| Qwen 3.5 | February 2026 | Alibaba releases a native multimodal model with 397B parameters (17B active per inference), 1M token context, 201 languages. Open-weight and designed for agents |
What we're living today is a geopolitical race. The United States leads with high-performance proprietary models — OpenAI, Anthropic, Google — while China responds with a different strategy: open-source, efficient, and cheap models.
U.S. chip export restrictions, designed to slow China down, are having the opposite effect, forcing Chinese engineers to optimize every token, every operation, every training dollar.
From response to action: what makes an agent an agent
When ChatGPT arrived in late 2022, the world discovered what a language model could do. But for most of 2023, the majority of models still didn't have function calling — the capability that turns a language model into an agent. That is, the ability to call functions. In other words, tools.
In essence, an agent is exactly that: a language model that runs in an environment with tools. Agents, directed by LLMs, receive the user's prompt, observe the list of tools at their disposal, and decide which ones to use and how.
Without function calling, an LLM can only respond with text. It can answer history questions, explain physics concepts, write a poem, or summarize a document — but everything it produces comes from what it learned during training, not from real-time data. You ask how much you spent this month and it can't know — it has no access to your bank. You ask about the weather and it responds with what seems right, not what is right now. And sometimes, with total confidence, it says things that are partially or completely false — what's known as hallucinations. It's a statistical model of language: it predicts the most probable next word, it doesn't verify facts. It can say anything, but it can't do anything.
The path toward function calling started with two pieces of research. In October 2022, Shunyu Yao et al. published ReAct, a method where the model alternates between reasoning and acting — for example, querying a Wikipedia API to verify a fact before responding, instead of making it up (published at ICLR 2023). In February 2023, Timo Schick et al. (Meta AI) published Toolformer, a model that learned on its own when it needed an external tool and how to invoke it with the correct arguments.
Today the word AI is everywhere, and the concept of agents floats around it. But most of what is sold as an "agent" isn't one. An email filter that moves messages to folders is not an agent — it's automation: fixed rules that follow a predefined script. If X happens, do Y. Always. Without variation.
An agent is something else. It receives a goal and decides how to achieve it. It reasons about the situation, chooses which tools to use, executes actions, observes the results, and adjusts its strategy if something fails.