How far does your paycheck go?

I’ve moved around, having lived in Pennsylvania, South Africa, Ohio, and now New York, and have frequently heard that the cost of living in different areas has a big impact on how far your income will go. The Bureau of Economic Analysis for the US Department of Commerce released a data set exploring this issue comparing total and per-capita incomes to the price-adjusted estimates by state. This visualization in Tableau explores how states compare. Select the states you’ve lived in to see how far your income goes in each area. One thing to note is that the per-capita adjustment units are dollars, where the total adjustment units are millions of dollars.

Learn About Tableau

Genetically Engineered Crop Adoption in the US

Genetically engineered crops have been around for a long time. Depending on your definition simple cross-breeding might qualify. They were introduced commercially in 1996 and have been widely adopted in the US. I’m a long way from graphing Punnett squares, but data.gov had an interesting data set on genetically engineered crop adoption. Using Tableau Public I’ve created this visualization from that data to show the percent of planted crops that are genetically engineered by state. If you move the date filter up to 2013 you’ll see not much remains in the non-genetically engineered category for soybeans, corn, and cotton with these indicators.

Learn About Tableau

Data were gathered as follows from the USDA Economic Research Service reference website:

Randomly selected farmers across the United States were asked if they planted corn, soybeans, or upland cotton seed that, through biotechnology, is resistant to herbicides, insects, or both. Conventionally bred herbicide-tolerant varieties were excluded. “Stacked” gene varieties are those containing GE traits for both herbicide tolerance (HT) and insect resistance (Bt).

According to NASS, the States published in these tables represent 81-86 percent of all corn planted acres, 87-90 percent of all soybean planted acres, and 81-93 percent of all upland cotton planted acres (depending on the year).

The acreage estimates are subject to sampling variability because all operations planting GE varieties are not included in the sample. The variability for the 48 corn States, calculated by NASS using the relative standard error at the U.S. level, is 0.3-1.8 percent for all GE varieties (depending on the year), 1.6-4.9 percent for insect-resistant (Bt)-only varieties, 1.6-3.8 percent for herbicide-tolerant-only varieties, and 0.7-10.8 percent for stacked gene varieties. Variability for the 31 soybean States is 0.3-0.8 percent for herbicide-tolerant varieties, depending on the year. Variability for the 17 upland cotton States is 0.6-2.2 percent for all GE varieties, 4.6-14.4 percent for insect-resistant (Bt)-only varieties, 2.6-7.9 percent for herbicide-tolerant-only varieties, and 2.0-4.2 percent for stacked gene varieties.

Anscombe’s Quartet in D3

In 1973 statistician Francis Anscombe created a collection of data sets to demonstrate the importance of data visualization. There are four different data sets in the collection, all having the same general high level statistical properties:

  • Mean of x is 9
  • Variance of x is 11
  • Mean of y is 7.50
  • Variance of y is 4.122 or 4.127
  • Correlation between x and y is 0.816
  • Linear regression of each is y = 3.00 + 0.500x

I’ve been learning D3 and wanted to use this as practice.

There is a great book by Scott Murray, “Interactive Data Visualization for the Web”, that has been my primary resource. I think a lot of people see the value in visualization, though in 1973 there may have been more skepticism. Anscombe’s quartet highlights the value in making a picture to aide analysis and identify outliers.

Congressional Approval and Missing Workers

I wanted to expand my last blog post on the understatement of the unemployment rate to include data about congressional approval ratings. I was not sure about any type of relationship, but since jobs are a hot political topic I thought the data could be interesting. The approval rating came from a Gallup poll that is publicly available and the unemployment data is the same public data set from the Economic Policy Institute. I used the traditional line chart but also made a connected scatter plot of the data separated by year after reading Alberto Cairo’s praise of them. To me, this view of the data is the most intriguing.

Learn About Tableau

Along with the trends over time I wanted to see the changes in direction of approval and unemployment. I took the difference of each adjacent data point to get the slope at that interval (since the x-axis is always one month) for both the congressional approval rate and the missing workers unemployment rate. The scales are different since the congressional approval rate tended to have bigger swings in direction. Therefore, I plotted this in two ways: first with the synchronized scale, and secondly with an unsynchronized y-axis to better see the changes.

Learn About Tableau