Carolina Torreblanca
University of Pennsylvania
Global Development: Intermediate Topics in Politics, Policy, and Data
PSCI 3200 - Spring 2026
Development is undergoing a data revolution
More data than ever: surveys, satellites, sensors, social media
We seem to think more data is useful for answering important questions
But what kinds of questions can data actually answer?
Data can help answer all three, but each demands its own toolkit
Description: How do we summarize? → Summary statistics
Prediction: How do we generalize? → Models
Causation: How do we reason about “what if”? → ???
Today’s focus: What toolkit does causation require?
Central Tendency
Where is the middle?
Spread
How dispersed are the data?
Correlation summarizes co-variation in a single number:
\[r_{XY} = \frac{\sum_{i}(X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i}(X_i - \bar{X})^2 \sum_{i}(Y_i - \bar{Y})^2}}\]
- Ranges from -1 to +1 - Measures linear association
More colloquially: what do we mean when say that X and Y are correlated?
Technically: A measure that summarizes (linearly) how two numeric variables move together
Conceptually: The basis of inductive evidence: generalizable patterns
When we say “X and Y are correlated,” we’re saying:
“Here is a regularity worth noting.”
But it does not explain why the pattern exists.
Strength: How tightly X and Y move together (0 to ±1)
Direction: Positive (both rise) or negative (one rises, other falls)
Nothing about why they move together
The UFOs and patents really are correlated in the data!
Correlation and summary statistics are powerful for description
But if we want to answer causal questions…
“Would a cash transfer reduce poverty?”
…we need a different toolkit entirely.
Correlation tells us X and Y move together.
But causation says something stronger: X produces or changes Y
How do we study this? Scientists made two key moves:
Rather than trying to observe causation directly, scientists:
Define causality as a comparison between TWO counterfactual states
Shift from individual causal effects to averages
Let’s unpack each move.
There are many ways to define causality:
A pharmaceutical company says:
“Patient took our pill. Patient got better. Therefore, the pill works.”
What’s wrong with this reasoning?
We saw the patient with the pill → got better
We didn’t see the patient without the pill
Maybe they would have gotten better anyway!
The causal question: Did the pill make the patient better?
To answer this, we need to compare:
Causality is always relative treatment vs. what alternative?
For each patient, define two possible states of the world:
\(Y_i(1)\): What happens if patient \(i\) takes the pill
\(Y_i(0)\): What happens if patient \(i\) doesn’t take the pill
Every patient has both. But we can only ever see one.
DEFINE the individual causal effect as the effect of the pill, relative to no pill, for patient \(i\):
\[TE_i = Y_i(1) - Y_i(0)\]
Can we calculate it?
For any patient, we observe only one outcome:
We never see the same patient both taking and not taking the pill
To measure causal effects, we need both.
But we can never observe the counterfactual.
No amount of data in the world lets us see both states for one patient
Ana took the pill and recovered — but would she have recovered anyway?
We will never know.
The individual treatment effect \(TE_i\) is unknowable. So what do we do?
We can’t know the pill’s effect on any one patient.
But what if we could estimate the average effect across many patients?
The solution: Compare groups, not individuals.
Instead of asking: “Did the pill help Ana?”
Ask: “On average, does the pill help patients recover?”
\[ATE = \overline{Y(1)} - \overline{Y(0)}\]
Average recovery rate with pill minus average recovery rate without pill.
People are different
So how can comparing group averages tell us anything causal?
We need to make an assumption.
Imagine: people who feel really sick take the pill. People who feel fine don’t.
The pill group was sicker to begin with.
They would have recovered less even without the pill.
Comparing these groups tells us nothing about the pill’s effect.
The groups aren’t comparable — they differed before treatment.
Assumption: The two groups are comparable on average.
The pill group and no-pill group would have had similar outcomes if neither had taken the pill.
If this holds → group averages are valid comparisons → we can estimate causal effects.
When who gets treatment isn’t determined by things that also affect the outcome
When there’s no systematic difference between groups before treatment
When it’s “as if” treatment was assigned at random
More on how to make this credible next class.
Understanding cause and effect is how we change things in the real world
Causal inference separates good evaluations from bad
We’ll explore these throughout the course.
Different questions need different tools — causal questions ask “what if?”
Causality = comparing two states (treatment vs. no treatment)
Fundamental Problem: We can’t observe both states for one person
Solution: Compare groups — if treatment is unrelated to differences, groups are comparable on average
Causal inference = building credible comparisons
How randomization solves the comparison problem
Why random assignment creates comparable groups
Final Project overview
https://carolina-torreblanca.github.io/psci3200-globaldev-main/