Section 1 of 5 · 12 min read
What AI Agents Actually Are
The word "agent" is everywhere in AI right now. Every tool claims to be one. What matters isn't the label — it's understanding what actually changes when AI moves from responding to you to pursuing goals on your behalf.
From tool to agent: what actually changes
Most climate professionals using AI today are using it as a tool: you ask, it answers. You paste a document and request a summary. You describe a problem and get options back. Every step requires you. Every action is yours to take.
Agents work differently. The practical definition: AI becomes more agentic when it can do more and decide more without you directing each step. You define the goal. The agent figures out the path — running code, browsing the web, reading files, calling APIs, writing outputs — without waiting for your instruction at every move.
The clearest illustration: when you use Claude with web search enabled and extended thinking on, you're already working with something more agentic than basic chat. Claude searches when it decides it needs to, thinks through choices before responding, and adapts based on what it finds. You set the task; it handles the moves underneath. That's a small version of the same shift.
At the far end: Anthropic's computer use capability gives an agent full access to a computer — it can open apps, browse websites, read files, click buttons, send messages on your behalf. You set a goal; the agent decides how to get there. Most decisions are the agent's. This is what "fully agentic" looks like in a real product available today.
The more useful question about any AI system isn't "Is this a real agent?" — it's "Does this let AI do more of the deciding and doing?" The label has become unreliable. The spectrum framing is what's useful.
The anatomy of an agent
What makes an agent technically? At any point on the spectrum, it comes down to three ingredients:
An LLM brain
The reasoning engine that interprets goals and decides what to do. Not just a text predictor here — it's the decision-making layer. When an agent reads a climate report and decides which sections to extract vs. summarize vs. flag for human review, that judgment is happening in the LLM.
Tools
The ability to act. Without tools, a model can only generate text — every action in the world is still yours to take. Tools change that: web search, code execution, file read/write, API calls, database queries, email sending. Each tool you give an agent is a new category of action it can take without you.
A loop
The model acts, observes the result, and acts again until the goal is reached or it gets stuck. This is what separates an agent from a prompt-response system. The ReAct pattern (Reason + Act) is one structure for this: reason about what to do, act, observe what happened, reason again. The specific framework matters less than the structure: the model uses its own observations to decide what to do next.
Memory, multi-agent systems, learning from feedback — all of this is built on top of those three ingredients. A system missing any one of them is something else, regardless of what it's called.
What "tools" means in practice
In agentic contexts, "tools" has a specific technical meaning: functions the LLM can call to interact with the world outside its own context window. The range is broad:
- —Web search: browse and retrieve live information, follow links, read pages
- —Code execution: write and run Python, R, shell scripts; process data; generate outputs
- —File access: read and write files on a filesystem — CSVs, PDFs, reports, logs
- —API calls: pull data from IEA, Global Forest Watch, EPA, satellite feeds, or any external service
- —Database queries: read and write structured data
- —Messaging: send emails, post to Slack, create calendar events
The tools available to an agent determine its blast radius — how much it can affect the world, and what the consequences are when it makes a mistake. An agent that can only read files has a much smaller blast radius than one that can also send emails and modify databases.
What an agent looks like in climate science
Researchers at UC San Diego and the Scripps Institution of Oceanography built an agent called Zephyrus, first published in October 2025 and presented at ICLR 2026, that gives climate scientists a natural language interface to scientific forecasting models.
Ask it a question — "Which coastal cities in Southeast Asia will experience the most extreme heat days by 2050?" — and it translates that into executable queries, runs them against the relevant climate models, retrieves the results, and returns a plain-language answer. No programming required on the scientist's end.
What makes Zephyrus an agent rather than a search tool is precisely this: the system decides how to find the answer, not just whether to look. The researcher specifies the goal; Zephyrus figures out the path. That's the fundamental shift.
The researcher specifies the goal; the agent figures out the path. That's the fundamental shift. Not smarter answers — a different relationship to the work itself.
Why climate work specifically needs agents
Not every task benefits from an agent. The tasks that do share a pattern: they're too long, too repetitive, or involve too many sources for a single prompt-response exchange to handle well.
Climate work has a high density of these tasks. Monitoring hundreds of forest polygons for deforestation signals. Synthesizing policy documents across 30 national adaptation plans. Pulling methane data from satellite feeds, correlating it with facility records, and flagging anomalies for human review. Running emissions reporting pipelines that normalize data from dozens of submitters with incompatible formats.
What these tasks have in common: they require information gathering from multiple sources, multi-step processing, and outputs that inform consequential decisions. They're the tasks where the gap between "AI helped me with this one document" and "AI handled the entire pipeline" is measured in hundreds of person-hours per year.
The oversight imperative
The same capabilities that make agents useful in climate work also make them genuinely dangerous if deployed without appropriate oversight. An agent that can autonomously browse, execute code, write files, and call APIs can also propagate errors across systems, make decisions at a scale no individual reviewer can catch in real time, and take actions that are difficult or impossible to reverse.
For climate work specifically, the stakes compound. A monitoring agent that misclassifies deforestation events produces flawed data that could flow into policy. An agent that writes to emissions databases can introduce errors that propagate into reporting. An agent handling MRV (measurement, reporting, and verification) that makes unsupported inferences can give false confidence to the very accountability systems climate governance depends on.
Human-in-the-loop isn't optional for consequential climate decisions — it has to be genuinely consequential, not decorative. The question isn't whether there's a human approval button somewhere in the system. It's whether any human ever actually reviews the agent's outputs before they flow into real decisions, and whether that human has the context and authority to override.
Human-in-the-loop isn't a checkbox. It's a design decision that determines whether oversight is real or performative. For high-stakes climate systems, the difference matters enormously.