Section 1 of 5 · 12 min read

Where Climate Data Actually Lives

Before you can analyze climate data, you have to find it. It doesn't live in one place — it's scattered across government agencies, research institutions, NGOs, and private companies, each with different methodologies, update cycles, and degrees of independence. Knowing where to look, and what you're actually getting, is the first skill.

Why AI can't just generate the data for you

Ask an LLM for emissions figures without providing a source, and you will get confident-sounding numbers that may or may not be real. AI models are trained on text about climate data — not on verified, current datasets. They can describe what OWID's CO₂ data looks like; they cannot reliably reproduce the specific figures accurately, especially for recent years.

This means your first job is always finding the actual data. Not asking AI to recall it, but locating a source you can download, cite, and audit. What AI can do is help you find the right source, interpret its methodology, and understand its limitations.

The rule: AI analyzes data you provide. It does not reliably supply data itself. Treat any AI-generated number as a hypothesis to check against a primary source.

The sources climate professionals actually use

The landscape divides roughly into four categories: aggregated databases that are easy to use but one step removed from primary data; government and intergovernmental sources that are authoritative but often delayed and politically complicated; satellite and remote-sensing systems that are independent of government reporting; and specialized trackers for specific sectors or flows.

Aggregated research databases

Our World in Data (OWID) is the best starting point for most climate work. It aggregates from primary sources — Global Carbon Project, IEA, BP Statistical Review — cites its methodology clearly, and provides download-ready CSVs. Their CO₂ dataset, maintained in collaboration with the Global Carbon Project, covers annual emissions by country back to 1750. For a first look at any emissions question, OWID is the right place to start.

Ember covers electricity and the power sector specifically. Updated frequently, nearly global coverage, and free to download. If your question involves the energy transition — solar and wind capacity additions, coal plant retirements, grid carbon intensity — Ember is the most current source available.

Global Carbon Project publishes the annual Global Carbon Budget, the most comprehensive accounting of CO₂ sources and sinks. Their figures are what OWID uses for emissions data. The limitation: it's an annual publication, so it's always at least a year behind.

Government and intergovernmental sources

IEA Data Browser is the most authoritative source for energy statistics. The limitation is significant: most detailed data is paywalled. The free tier covers high-level national figures and some sector breakdowns. If your institution has a subscription, the IEA is often the most current source available. If not, Ember covers the power sector more accessibly.

UNFCCC national inventories are how countries formally report their emissions. Every Annex I country submits annual inventories; non-Annex I countries submit less frequently, some every five to ten years, some never. This is a structural coverage gap: many least-developed countries haven't filed an inventory in years. Any global total calculated from UNFCCC data silently excludes these countries.

EPA is authoritative for US emissions and air quality data. For domestic US analysis, the EPA Greenhouse Gas Reporting Program has facility-level data going back to 2010 — one of the most granular emissions datasets publicly available anywhere.

Satellite and independent tracking

Climate TRACE produces emissions estimates derived from satellite data, machine learning, and other remote-sensing sources — independent of government self-reporting. Their 2023 analysis found that actual oil and gas methane emissions were roughly 3× higher than official government figures. That is not a rounding error; it is the difference between a tractable problem and an urgent one. For sectors where reporting incentives are weak or political pressure is high, satellite-based estimates are increasingly essential.

Global Forest Watch provides near-real-time deforestation alerts from satellite imagery. For forest carbon, land-use change, and deforestation commitments, it's the most current and spatially detailed source available.

Finance and policy trackers

OECD Climate Finance is the primary source for international climate finance data — adaptation and mitigation flows from developed to developing countries. The OECD reports the developed-country commitment was met in 2022 ($115.9 billion); Oxfam's Shadow Report, using different methodology to strip out loans and private finance mobilized claims, estimated true grant-equivalent flows at closer to $24 billion. Same underlying transactions, radically different totals. Understanding why requires reading both methodologies.

NDC Tracker (Climate Action Tracker) monitors countries' Nationally Determined Contributions — what they've committed to under the Paris Agreement and whether they're on track. Invaluable for policy analysis, but remember that NDC targets themselves vary enormously in ambition and methodology.

Raw data vs. processed indicators — a distinction that matters

Most climate data you encounter has been processed. Emissions figures have been converted from physical measurements into CO₂-equivalent using Global Warming Potential multipliers. Finance figures have been deflated to constant prices. Satellite measurements have been corrected for instrument drift and atmospheric interference. Energy statistics have been converted from fuel mass to terajoules using country-specific conversion factors.

None of this processing is neutral. Each step involves choices — which GWP value, which base year for inflation, which correction algorithm. When two credible sources report different numbers for the same thing, the explanation is almost always somewhere in the processing chain, not in the raw measurements.

A telling example: UAH and RSS, two groups analyzing the same satellite microwave temperature data, produced diverging global temperature trends for years. The discrepancy was eventually traced to how each team corrected for satellite orbit decay — a processing choice, not a measurement problem. Same raw data, different conclusions.

When you find a number that surprises you — or that contradicts another source — the first question is: where exactly do the processing decisions diverge? Not: which source is right.

Four questions to ask before trusting any source

Is the methodology documented?

A credible source can tell you exactly how its figures were calculated. OWID links to methodology notes for every dataset. The Global Carbon Project publishes peer-reviewed papers. If you can't find a methodology document, treat the data with extra skepticism. "Trust us" is not a methodology.

How current is it?

Solar deployment numbers from 2022 are already significantly outdated — the sector has been growing 40-50% per year. Climate finance figures lag by 2-3 years due to reporting delays. For fast-moving topics, always check when the data was last updated and whether a more current source exists.

Who funded it, and what are their incentives?

Political independence matters. National emissions inventories are self-reported — countries have incentives to undercount, and many do. Corporate sustainability reports are largely unaudited. Industry-funded research on their own sector's emissions should be read against independent estimates. This doesn't mean dismissing government or industry data; it means knowing what it can and can't tell you.

Does the coverage match your question?

A dataset comprehensive for OECD countries may have major gaps for sub-Saharan Africa. A sector database built around Annex I reporting frameworks misses emissions sources common in the Global South. Always ask: which countries, sectors, and years are included — and what's missing?

Getting data into a usable format

CSV is the universal format for AI data analysis. Most good sources offer CSV downloads directly — OWID, Ember, and Climate TRACE all do. The IEA free tier offers Excel exports. When data is only available in PDFs (common with IPCC assessments, national communications, and corporate reports), you have two practical options: Tabula for extracting tables from clean PDFs, or pasting the text directly into an AI and asking for CSV conversion.

When you use AI to extract data from PDFs, verify carefully. AI will sometimes fill gaps with plausible-looking invented values — especially in tables with merged cells, footnotes, or complex layouts. The rule is always: check a sample of extracted values against the original before proceeding with analysis.

Next: Cleaning Data with AI →