USA – Ethnicity & Income

Following up on my previous post, and inspired by datausa.io, I hit me that I had a pretty cool dataset I could explore, thanks to the chloroplethr package.

The dataset I’m referring to comes from the US 2013 Census data. Let’s take a look:

require(choroplethr)
require(dplyr)
require(magrittr)
require(ggplot2)
api.key.install('your-api-key-here')
demographics=get_county_demographics()
head(demographics)
##   region total_population percent_white percent_black percent_asian
## 1   1001            54907            76            18             1
## 2   1003           187114            83             9             1
## 3   1005            27321            46            46             0
## 4   1007            22754            75            22             0
## 5   1009            57623            88             1             0
## 6   1011            10746            22            71             0
##   percent_hispanic per_capita_income median_rent median_age
## 1                2             24571         668       37.5
## 2                4             26766         693       41.5
## 3                5             16829         382       38.3
## 4                2             17427         351       39.4
## 5                8             20730         403       39.6
## 6                6             18628         276       39.6

One thing in particular I was curious about was this: how can I show on a map, how much a particular group of the population (e.g. the black population) is hit by poverty.

We do have a percentage of black population as well as a per-capita income for each county. Let us merge this data on the regions data from the chloroplethr package:

county_data  <- county.regions %>% left_join(demographics)
## Joining by: 'region'
head(county_data)
##   region county.fips.character county.name state.name state.fips.character
## 1   1001                 01001     autauga    alabama                   01
## 2   1003                 01003     baldwin    alabama                   01
## 3   1005                 01005     barbour    alabama                   01
## 4   1007                 01007        bibb    alabama                   01
## 5   1009                 01009      blount    alabama                   01
## 6   1011                 01011     bullock    alabama                   01
##   state.abb total_population percent_white percent_black percent_asian
## 1        AL            54907            76            18             1
## 2        AL           187114            83             9             1
## 3        AL            27321            46            46             0
## 4        AL            22754            75            22             0
## 5        AL            57623            88             1             0
## 6        AL            10746            22            71             0
##   percent_hispanic per_capita_income median_rent median_age
## 1                2             24571         668       37.5
## 2                4             26766         693       41.5
## 3                5             16829         382       38.3
## 4                2             17427         351       39.4
## 5                8             20730         403       39.6
## 6                6             18628         276       39.6

The idea will be to rely on the relatively unknown yet uber-cool harmonic mean. This mean has the property that it is high when both components (in my case I will only have two) are high.

Here’s the formula for the harmonic mean of two numbers:

harmonic_mean = function(x, y){
  (2*x*y)/(x+y)
}

So I will be using the harmonic mean of the percentage of black population and the per-capita income. But to make those measures play nice together, let us first transform then into ranks and map that rank to the [0,1] interval:

county_data$percent_black_ranked <- rank(county_data$percent_black, ties.method='first')
county_data$percent_black_ranked <- county_data$percent_black_ranked / max(county_data$percent_black_ranked)

county_data$per_capita_income_ranked  <- rank(county_data$per_capita_income, ties.method='first')
county_data$per_capita_income_ranked <- 1 - county_data$per_capita_income_ranked / max(county_data$per_capita_income_ranked)

Notice that I took the complement of per-capita income rank to show poverty.

We are now ready to create our map:

county_data$value  <- harmonic_mean(county_data$percent_black_ranked, county_data$per_capita_income_ranked)

county_choropleth(county_data) + scale_fill_brewer(palette=3) + ggtitle('Map of (black % of population &amp;amp; poverty) index')

unnamed-chunk-8-1

The dark areas are where there is a both a relatively high percentage of black population and a low per-capita income.

The same exercise with the hispanic population yields:

unnamed-chunk-9-1

Out of curiosity, let us look at white % of population and wealth:

county_data$percent_white_ranked <- rank(county_data$percent_white, ties.method='first')
county_data$percent_white_ranked <- county_data$percent_white_ranked / max(county_data$percent_white_ranked)

county_data$per_capita_income_ranked  <- rank(county_data$per_capita_income, ties.method='first')
county_data$per_capita_income_ranked <- county_data$per_capita_income_ranked / max(county_data$per_capita_income_ranked)

unnamed-chunk-11-1

Advertisements

2 comments

    1. Hmmm that’s unfortunate…I’m not familiar with this error.
      Anyway I’ve decided to revisit that post and get the data from scratch using the acs package, it’s really fascinating the variety of data we can get. I intend to do a short yet detailed post on this probably tomorow.
      Thank you for your comments !

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s