Terra Studio/AI Agents for Climate

Section 4 of 5 · 12 min read

Autonomous Monitoring for Climate

Some of the most powerful agent applications in climate aren't about drafting or analysis — they're about watching: continuous data ingestion, anomaly detection, and getting the right alert to the right person at the right time. This is harder than it sounds.

Climate monitoring — continuous data ingestion and anomaly detection

What climate monitoring agents look like

A monitoring agent sits between data sources and human decision-makers, running continuously or on a schedule. The basic structure: data arrives (from satellites, sensors, APIs, databases), the agent processes it against defined criteria, and produces either a logged record or an alert that routes to someone who can act.

The simplest version — and the place to start — is what the LMS content calls a "Watcher": a trigger-router-action pattern. Something arrives, the agent applies judgment to classify it, and it routes to the appropriate output. A Carbon Brief RSS monitor that sends relevant UK energy articles to a Slack channel and logs everything else to a spreadsheet is already this pattern.

More sophisticated versions layer in anomaly detection (comparison against historical baselines), multi-source correlation (does the satellite signal match the facility's reported output?), and adaptive thresholds (what counts as an anomaly for a facility that operates seasonally).

The monitoring agent's job is not to make decisions — it's to make the right information available to humans who can. That framing determines the entire design: what it surfaces, how it surfaces it, and when it escalates vs. logs quietly.

Real applications

Each of these is a deployed or near-deployed application, not a hypothetical.

01

Deforestation monitoring

Satellite imagery processed against historical canopy cover baselines, with alerts triggered when polygon coverage drops beyond a threshold. Global Forest Watch's alert system is one version of this. The agent challenge: distinguishing seasonal changes, agricultural burning, and planned harvesting from illegal deforestation, without requiring human review of every flagged event.

02

Methane leak detection

Continuous-monitoring satellite data (TROPOMI, MethaneSAT, GHGSat) processed to detect anomalous methane concentrations around oil and gas facilities, pipelines, and landfills. Climate TRACE uses a version of this to produce facility-level emissions estimates that often show 3× higher emissions than operator self-reports. The agent's role: flag discrepancies; the human's role: decide whether to escalate to a regulator, engage the operator, or investigate further.

03

Emissions compliance tracking

Comparing reported facility emissions against EPA monitoring data, permit thresholds, and historical baselines on a rolling basis. An agent can flag inconsistencies, near-misses before threshold violations, and data quality issues much faster than manual review of quarterly reports — but cannot determine whether a discrepancy is a measurement error, a reporting error, or an actual violation. That determination requires human judgment.

04

Extreme weather alerts

NOAA and WMO data feeds combined with grid-level vulnerability maps to route alerts about heat waves, flood risk, and drought conditions to the relevant program officers, community contacts, or logistics systems. The complexity here is in the routing: the same temperature anomaly is a different kind of alert for an agricultural program vs. an urban heat health program.

The alert design problem

Alert fatigue is one of the most common failure modes in monitoring systems, AI-powered or not. When a monitoring system sends too many alerts — many of which turn out to be false positives or low-priority signals — the people receiving them stop paying attention. The moment a real signal arrives in a stream of noise, it gets missed.

The tradeoffs are genuinely difficult. A highly sensitive threshold catches more real events but also more false positives. A stricter threshold misses fewer events that don't matter but might miss real ones. Adaptive thresholds (calibrated per facility, per season, per data source) improve precision but require more design work and can fail in unexpected ways when conditions change.

The practical guidance from operational monitoring systems: design for the minimum effective alert rate. Ask: what's the smallest number of alerts that would still catch everything that requires human attention? Work backward from that. Every alert that doesn't require action is eroding the credibility of the alerts that do.

Too many alerts

Result: Alert fatigue — reviewers stop reading carefully, critical events get missed

Mitigation: Raise thresholds, add confidence scoring, batch low-priority alerts into digests rather than immediate notifications

Too few alerts

Result: Missed signals — the monitoring system fails at its core purpose

Mitigation: Audit against a labeled test set; track false negative rate explicitly, not just false positive rate

Designing human-AI handoffs

The handoff moment — when the agent passes something to a human — is the most failure-prone part of a monitoring system. The design decisions made there determine whether the human can actually act effectively.

A well-designed handoff gives the human: the specific data that triggered the alert, the agent's reasoning (not a black box), the context they need to assess whether it's a real signal (what the baseline is, what the history looks like), and a clear statement of what action is being requested of them (review and dismiss, escalate, investigate further).

A poorly designed handoff gives the human a raw flag with no context, expecting them to retrieve the relevant information themselves. In practice, under time pressure, those flags get ignored.

Design the handoff for the moment when the human is under the most time pressure and has the least context. That's when it matters. An alert that requires 15 minutes of investigation before you can act on it is not a useful alert.

The governance question

Monitoring systems create authority questions that often aren't answered until there's a conflict. When an agent flags a potential violation and an alert goes to a program officer, that officer has to know: what authority does this alert carry? Am I obligated to escalate? Who can I check with before acting? Who overrides if I'm wrong?

The specific questions to answer before a monitoring agent goes into production:

  • Who receives alerts? Is this one person or multiple? Is there a backup when the primary is unavailable?
  • What authority does an alert carry? Does receiving it obligate any action, or is it purely advisory?
  • Who can override? What's the process when a human disagrees with the agent's classification?
  • Who is accountable when the agent gets it wrong? Not in principle — in practice, with a name.

MRV: what agents can and cannot verify

Measurement, Reporting, and Verification (MRV) is the backbone of climate accountability — it's how carbon credits get issued, how compliance claims get validated, and how national emissions inventories get built. The question of what agents can and cannot verify in MRV contexts connects directly to Course 1's responsible AI framework.

Agents can help substantially with the measurement and reporting layers: automated data collection, format normalization, consistency checks, comparison against historical records, flagging of outliers. These are the data-processing tasks where agent automation reduces error and scales coverage.

The verification layer is different. Verification requires judgment about whether reported activity actually occurred, whether measurement methodology was applied correctly, and whether the chain of custody for data is intact. These are not pattern-matching problems — they require domain expertise, physical inspection in many cases, and accountability that can only be held by humans.

The risk in MRV is AI being used to create the appearance of verification without the substance: automated outputs that look like rigorous assessment but are actually just formatting of unverified inputs. Watch for this in any carbon credit or compliance claim that cites AI-powered MRV without explaining what specifically the AI verified and how.