To explore the relationship between economic factors and birth rate, chart birth rate and economic indicators for several regions over a time period with changing economic conditions.
Project Overview
Data
Previously gathered data on birth rates and gross domestic product (GDP) per capita for the period 1998-2015 were tidied using a spreadsheet program and then read into a programming environment.
Technologies
Some downloaded data were reformed into a tidy data file using Microsoft Excel. The R programming language was used to read and plot data. Packages used included plyr, dplyr, data.table, and ggplot2.
Techniques
Downloaded data was reformed into a tidy dataset by transposing rows and columns and moving from a “wide” format to a “long” format. Tidied data was then saved to a comma-separated values (CSV) file that was subsequently read into the R programming environment. Data were plotted as time series and as comparison scatter plots using the ggplot2 package for R.
Software
Excel and RStudio running R on a Windows OS desktop computer were used as the development environments.
Lessons Learned
Data in tabular format must generally be transformed into “long” format before visualizing. For small datasets it is easy to accomplish the transformation in a spreadsheet program, but for larger datasets packages like tidytext
and tidyr
from the tidyverse
are useful.
Applications
Visualization is an important part of exploratory data analysis and communicating results. Data preparation is often required to yield useful charts and graphs.
Project Background
A Pew Research Center report published April 6, 20101 linked a decline in the US birth rate to the recession of 2008. This project was intended to investigate whether the trend was generalized to other regions of the world and to see if the birth rate was related to economic conditions.
Step 1 – Tidying the Data
Previously downloaded data on birth rate and per capita Gross Domestic Product (GDP) were stored in a CSV file. The data was viewed in Excel and was formatted in a tabular (“wide”) form, with separate rows for each region and a column for each year. The Excel “Paste Special, Transpose” functionality was used to put the data into columns. Separate columns for the same measure were then combined in “long” form to create a tidier dataset. The data were saved in a CSV file.
Step 2 – Extracting the Data
Previously saved data from R processing was loaded. Data in the CSV file was read into a data frame in R and used for charting.
library(tidyverse)
## -- Attaching packages --------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.0.0 v purrr 0.2.5
## v tibble 1.4.2 v dplyr 0.7.6
## v tidyr 0.8.1 v stringr 1.3.1
## v readr 1.1.1 v forcats 0.3.0
## -- Conflicts ------------------------------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
load("USbirths.RData")
wbdata <- read.csv("WBData.csv")
names(wbdata) <- c("Year", "Region", "RegionCode", "BirthRate", "GDPpc", "GDPpcPPP")
Step 3 – Visualizing the Data
Birth rates for each region were first plotted as time series using the ggplot2
package for R to see if the drop in birth rate after 2008 seen in previously investigated US data was replicated elsewhere. While birth rates generally declined worldwide for the period 1998b 2015, only higher income regions experienced the sharp drop seen in the US. Worldwide there was an excess decrease after 2008, but the general trendline did not change significantly.
ggplot(wbdata, aes(Year, BirthRate, color = Region)) + geom_point() +
labs(title = "Crude Birth Rate by Region",
x = "Year",
y = "Births per 1,000 People",
color = "UN Region")
Crude birth rate was then plotted vs. per capita GDP (current $, compared by Purchasing Power Parity (PPP)).
ggplot(wbdata, aes(GDPpcPPP, BirthRate, color = Region)) + geom_point() +
labs(title = "Crude Birth Rate Compared to GDP Per Capita",
x = "GDP Per Capita (Purchasing Power Parity, Current $)",
y = "Births per 1,000 People",
color = "UN Region")
The birth rate appears to be strongly inversely correlated with the average wealth of a region, particularly for poor regions. Any effect of worldwide economic events could easily be masked by this relationship.
To see if the 2008 recession affected all regions, per capita GDP (current $, PPP) was plotted for the period 1998-2015.
ggplot(wbdata, aes(Year, GDPpcPPP, color = Region)) + geom_point() +
labs(title = "Evolution of GDP Per Capita",
x = "Year",
y = "GDP Per Capita (Purchasing Power Parity, Current $)",
color = "UN Region")
Low, lower middle, and middle income regions seemed hardly touched by the 2008 economic downturn as their trends toward increasing wealth continued. The US appears to have been the most strongly affected region, followed by the group of high income regions. In all affected regions, wealth regained its pre-shock level by 2010 or 2011.
Conclusion
In general, birth rates across the globe have been declining over the past 18 years. Below a certain degree of wealth, a region’s birth rate is inversely correlated with GDP per capita, a measure of the region’s wealth. In high wealth regions such as the US, however, economic shocks can have a negative effect on birth rate. This effect can persist for years after wealth regains its pre-shock level.
The US appears to have been the region most strongly affected by the 2008 recession, followed by the group of high income regions.
Gretchen Livingston and D’Vera Cohn, U.S. Birth Rate Decline Linked to Recession, http://www.pewsocialtrends.org/2010/04/06/us-birth-rate-decline-linked-to-recession/.↩