Close

A guide to elegant tiled heatmaps in R [2019]

A step-by-step guide to data preparation and plotting of simple, neat and elegant heatmaps in R using base graphics and ggplot2.

This is an update to the old post from 2015 on the same topic. This covers the exact same thing but using the latest R packages and coding style using the “fancy” pipes (%>% ) and tidyverse packages.

This was inspired by the disease incidence rate in the US featured on the Wall Street Journal. The disease incidence dataset was originally used in this article in the New England Journal of Medicine. In this post, I am using the measles level 1 incidence (cases per 100,000 people) dataset obtained as a .csv file from Project Tycho. Download the .csv file here.

In this post, we will look into creating a neat, clean and elegant heatmap in R. No clustering, no dendrograms, no trace  lines, no bullshit. We will go through some basic data cleanup, reformatting and finally plotting. We go through this step by step. For the whole code with minimal explanations, scroll to the bottom of the page.

Read More

Scraping Instagram and choosing hashtags

We scrap Instagram for basic public data using R to help us pick optimal hashtags.

If you are an Instagram user, at some point, you care going to be interested in the various metrics such as followers, number of posts by a certain user etc. You might want to compare these metrics between different users or to find out the number of posts with a certain hashtag etc. The casual way to do it is to go the relevant Instagram page and look at the metric and write it down somewhere, and go to next and so on. Clearly this is not ideal strategy if you want to look at a few hundred pages. It would be neat to get this data in an automated manner.

Read More

Humanity has never lived in better times

It is easy to be disillusioned and pessimistic about the world we live in. Bad news seems to be followed by worse news. But humanity has come a long way from the disease-ridden, impoverished, war-torn lives of our fore-fathers. Here we look at a few data-driven graphs to convince ourselves of the progress we have made over time in various aspects of life. Slow progress never makes headlines.

It may seem like the world is descending into total chaos, violence, and destruction. War in Syria, Ukraine, Yemen, Islamic state, migrant crisis, Ebola, plane crashes, earthquakes, tsunamis and what-not. The more news you watch, the more worried you will be. This is because the news outlets tend to focus on spectacularly negative instances. Violence, atrocities, and hatred are thrown into the spotlight and into the lives of common people. With the ever increasing digital connectivity, it is easy to disseminate information and to absorb information at an unprecedented level. Relatively smaller incidents have a larger voice. As said by Ray Kurzwil, “The world isn’t getting worse, our information is getting better”. To appreciate the world we live in, we have to put things into a wider context.

The fact is that humanity has never lived in a better time than now in pretty much every aspect you look at; war, violence, diseases, poverty are all at the lowest it has ever been. Of course, there is still a long way to go, but this is the best it has been since the beginning of humankind. To prove my point, here we evaluate human progress using some real data and simple time-series plots. Most of the data and information was obtained from OurWorldInData.

Read More