Data Wrangling Workshop
Required Readings
- None
Overview
Today we will practice intermediate data wrangling by working with two real datasets: Olympic Games medal counts and World Bank development indicators. We’ll clean each dataset, merge them, and analyze whether national wealth predicts Olympic success.
Step 1: Set Up Your Project
Create a folder called wrangling_workshop on your computer. Inside it, create three subfolders:
wrangling_workshop/
├── input/ ← raw data goes here (never modify these files)
├── output/ ← cleaned data, figures, tables go here
├── code/ ← your scripts go here
You can create these in Finder/File Explorer, or in the Terminal:
mkdir wrangling_workshop
cd wrangling_workshop
mkdir input output codeStep 2: Create an RProject
- Open RStudio
- Go to File → New Project → Existing Directory
- Browse to your
wrangling_workshopfolder and click Create Project
This creates a .Rproj file. From now on, always open your project by double-clicking the .Rproj file. This tells R where your files are so you can use relative paths like read_csv("input/olympics_raw.csv") instead of long absolute paths that break on other computers.
Step 3: Download the Data
Right-click each link below and choose “Save Link As…” (or “Download Linked File”). Save both files into your input/ folder.
- olympics_raw.csv — Olympic athletes and medals, Athens 1896 to Rio 2016
- wb_indicators_panel.csv — GDP per capita, population, and female labor force participation from the World Bank (2000-2016, Olympic years only)
Step 4: Download the Workshop Script
Save this R script into your code/ folder:
Check Your Setup
Before we start, your folder should look like this:
wrangling_workshop/
├── wrangling_workshop.Rproj
├── input/
│ ├── olympics_raw.csv
│ └── wb_indicators_panel.csv
├── output/
│ └── (empty for now)
├── code/
│ └── wrangling_workshop.R
Open wrangling_workshop.Rproj, then open code/wrangling_workshop.R in RStudio. We’ll work through it together in class.