This is an update to the old post from 2015 on the same topic. This covers the exact same thing but using the latest R packages and coding style using the “fancy” pipes (
%>% ) and tidyverse packages.
This was inspired by the disease incidence rate in the US featured on the Wall Street Journal. The disease incidence dataset was originally used in this article in the New England Journal of Medicine. In this post, I am using the measles level 1 incidence (cases per 100,000 people) dataset obtained as a .csv file from Project Tycho. Download the .csv file here.
In this post, we will look into creating a neat, clean and elegant heatmap in R. No clustering, no dendrograms, no trace lines, no bullshit. We will go through some basic data cleanup, reformatting and finally plotting. We go through this step by step. For the whole code with minimal explanations, scroll to the bottom of the page.