5 min read

Visualizing Birth Rates and Economic Indicators from 1998 to 2015

To explore the relationship between economic factors and birth rate, chart birth rate and economic indicators for several regions over a time period with changing economic conditions.

Project Overview

Data

Previously gathered data on birth rates and gross domestic product (GDP) per capita for the period 1998-2015 were tidied using a spreadsheet program and then read into a programming environment.

Technologies

Some downloaded data were reformed into a tidy data file using Microsoft Excel. The R programming language was used to read and plot data. Packages used included plyr, dplyr, data.table, and ggplot2.

Techniques

Downloaded data was reformed into a tidy dataset by transposing rows and columns and moving from a “wide” format to a “long” format. Tidied data was then saved to a comma-separated values (CSV) file that was subsequently read into the R programming environment. Data were plotted as time series and as comparison scatter plots using the ggplot2 package for R.

Software

Excel and RStudio running R on a Windows OS desktop computer were used as the development environments.

Lessons Learned

Data in tabular format must generally be transformed into “long” format before visualizing. For small datasets it is easy to accomplish the transformation in a spreadsheet program, but for larger datasets packages like tidytext and tidyr from the tidyverse are useful.

Applications

Visualization is an important part of exploratory data analysis and communicating results. Data preparation is often required to yield useful charts and graphs.

Project Background

A Pew Research Center report published April 6, 20101 linked a decline in the US birth rate to the recession of 2008. This project was intended to investigate whether the trend was generalized to other regions of the world and to see if the birth rate was related to economic conditions.

Step 1 – Tidying the Data

Previously downloaded data on birth rate and per capita Gross Domestic Product (GDP) were stored in a CSV file. The data was viewed in Excel and was formatted in a tabular (“wide”) form, with separate rows for each region and a column for each year. The Excel “Paste Special, Transpose” functionality was used to put the data into columns. Separate columns for the same measure were then combined in “long” form to create a tidier dataset. The data were saved in a CSV file.

Step 2 – Extracting the Data

Previously saved data from R processing was loaded. Data in the CSV file was read into a data frame in R and used for charting.

library(tidyverse)
## -- Attaching packages --------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.0.0     v purrr   0.2.5
## v tibble  1.4.2     v dplyr   0.7.6
## v tidyr   0.8.1     v stringr 1.3.1
## v readr   1.1.1     v forcats 0.3.0
## -- Conflicts ------------------------------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
load("USbirths.RData")

wbdata <- read.csv("WBData.csv")
names(wbdata) <- c("Year", "Region", "RegionCode", "BirthRate", "GDPpc", "GDPpcPPP")

Step 3 – Visualizing the Data

Birth rates for each region were first plotted as time series using the ggplot2 package for R to see if the drop in birth rate after 2008 seen in previously investigated US data was replicated elsewhere. While birth rates generally declined worldwide for the period 1998b 2015, only higher income regions experienced the sharp drop seen in the US. Worldwide there was an excess decrease after 2008, but the general trendline did not change significantly.

ggplot(wbdata, aes(Year, BirthRate, color = Region)) + geom_point() +
        labs(title = "Crude Birth Rate by Region",
             x = "Year",
             y = "Births per 1,000 People",
             color = "UN Region")

Crude birth rate was then plotted vs. per capita GDP (current $, compared by Purchasing Power Parity (PPP)).

ggplot(wbdata, aes(GDPpcPPP, BirthRate, color = Region)) + geom_point() +
        labs(title = "Crude Birth Rate Compared to GDP Per Capita",
             x = "GDP Per Capita (Purchasing Power Parity, Current $)",
             y = "Births per 1,000 People",
             color = "UN Region")

The birth rate appears to be strongly inversely correlated with the average wealth of a region, particularly for poor regions. Any effect of worldwide economic events could easily be masked by this relationship.

To see if the 2008 recession affected all regions, per capita GDP (current $, PPP) was plotted for the period 1998-2015.

ggplot(wbdata, aes(Year, GDPpcPPP, color = Region)) + geom_point() +
        labs(title = "Evolution of GDP Per Capita",
             x = "Year",
             y = "GDP Per Capita (Purchasing Power Parity, Current $)",
             color = "UN Region")

Low, lower middle, and middle income regions seemed hardly touched by the 2008 economic downturn as their trends toward increasing wealth continued. The US appears to have been the most strongly affected region, followed by the group of high income regions. In all affected regions, wealth regained its pre-shock level by 2010 or 2011.

Conclusion

In general, birth rates across the globe have been declining over the past 18 years. Below a certain degree of wealth, a region’s birth rate is inversely correlated with GDP per capita, a measure of the region’s wealth. In high wealth regions such as the US, however, economic shocks can have a negative effect on birth rate. This effect can persist for years after wealth regains its pre-shock level.

The US appears to have been the region most strongly affected by the 2008 recession, followed by the group of high income regions.


  1. Gretchen Livingston and D’Vera Cohn, U.S. Birth Rate Decline Linked to Recession, http://www.pewsocialtrends.org/2010/04/06/us-birth-rate-decline-linked-to-recession/.