Close

Scraping Instagram and choosing hashtags

We scrap Instagram for basic public data using R to help us pick optimal hashtags.

If you are an Instagram user, at some point, you care going to be interested in the various metrics such as followers, number of posts by a certain user etc. You might want to compare these metrics between different users or to find out the number of posts with a certain hashtag etc. The casual way to do it is to go the relevant Instagram page and look at the metric and write it down somewhere, and go to next and so on. Clearly this is not ideal strategy if you want to look at a few hundred pages. It would be neat to get this data in an automated manner.

Read More

Structure ‘Sort by Q’ explained.

STRUCTURE is a popular software used by biologists to infer the population structure of organisms using genetic markers. Barplots in STRUCTURE have an option to sort individuals by Q. We explore the ‘Sort by Q’ option using R and Excel to figure out what it does.

STRUCTURE is a popular software used by biologists to infer the population structure of organisms using genetic markers. Barplots in STRUCTURE have an option to sort individuals by Q. We are going to figure out what this means and how it is done.

Read More

A guide to elegant tiled heatmaps in R

A step-by-step guide to data preparation and plotting of simple, neat and elegant heatmaps in R using base graphics and ggplot2.

This was inspired by the disease incidence rate in the US featured on the Wall Street Journal which I mentioned in one of the previous posts. The disease incidence dataset was originally used in this article in the New England Journal of Medicine. Here, I use the measles level 1 incidence (cases per 100,000 people) dataset obtained as a .csv file from Project Tycho. Download the .csv file here or head over to Project Tycho for other datasets.

In this post, we will look into creating a neat, clean and elegant heatmap in R. No clustering, no dendrograms, no trace  lines, no bullshit. We will go through some basic data cleanup, reformatting and finally plotting. We go through this step by step. For the whole code with minimal explanations, scroll to the bottom of the page.

Read More