Scraping Instagram and choosing hashtags

We scrap Instagram for basic public data using R to help us pick optimal hashtags.

If you are an Instagram user, at some point, you care going to be interested in the various metrics such as followers, number of posts by a certain user etc. You might want to compare these metrics between different users or to find out the number of posts with a certain hashtag etc. The casual way to do it is to go the relevant Instagram page and look at the metric and write it down somewhere, and go to next and so on. Clearly this is not ideal strategy if you want to look at a few hundred pages. It would be neat to get this data in an automated manner.

Read More

Structure ‘Sort by Q’ explained.

STRUCTURE is a popular software used by biologists to infer the population structure of organisms using genetic markers. Barplots in STRUCTURE have an option to sort individuals by Q. We explore the ‘Sort by Q’ option using R and Excel to figure out what it does.

STRUCTURE is a popular software used by biologists to infer the population structure of organisms using genetic markers. Barplots in STRUCTURE have an option to sort individuals by Q. We are going to figure out what this means and how it is done.

Read More

A guide to elegant tiled heatmaps in R

A step-by-step guide to data preparation and plotting of simple, neat and elegant heatmaps in R using base graphics and ggplot2.

This post/code is now outdated. Please see this new link for updated code.

This was inspired by the disease incidence rate in the US featured on the Wall Street Journal which I mentioned in one of the previous posts. The disease incidence dataset was originally used in this article in the New England Journal of Medicine. Here, I use the measles level 1 incidence (cases per 100,000 people) dataset obtained as a .csv file from Project Tycho. Download the .csv file here or head over to Project Tycho for other datasets.

In this post, we will look into creating a neat, clean and elegant heatmap in R. No clustering, no dendrograms, no trace  lines, no bullshit. We will go through some basic data cleanup, reformatting and finally plotting. We go through this step by step. For the whole code with minimal explanations, scroll to the bottom of the page.

Read More

Elegant scientific graphs: Learning from examples

Scientific graphics are key to understanding complex data. In addition to graphing data, it is important that the graphics are clean, elegant and visually stunning. This post explores examples of attractive graphs from popular magazines and news websites.

Scientific graphs are key in science to presenting usually complex data in an attractive and concise manner. Scientific graphs are supposed to be a visually summary of your data.

Data collection > Data analysis > Data visualisation

The most important part of a plot is of course its content. Once you have good content/data, you need to think about how to represent this data. Which sort of plot to use. How to best convey this information. See this article for most common types of data visualisation. Some of the popular programming environments for plotting include R and Python with Julia being the latest addition. Other options include Excel, Tableau, Plotly and more. R is a great tool for graphics as evident from the numerous images, blogs and publications over the past years. There are several resources that can help you find code to create all sorts of graphs in R.

But, just managing to create a graph is not sufficient in my opinion. The graphic has to be beautiful, elegant, user-friendly and attractive. Getting from data to a plot is one thing, but creating a high-quality, publishable and professional looking figure is a different story. A Basic plot is the initial basic output figure from any plotting software. This uses the default setting and default looks. Most people stop at this point because they have gone through considerable effort to get the data, analyse it and finally figure out the code to plot it. But a basic plot is usually not going to look professional or elegant. It will need some level of customisation to make it attractive. The ggplot2 plotting package in R, for example, produces a pretty decent default output, but they are overused and the graphics are not unique or catchy. I haven’t come across many sources that go into the fine details of the making a professional looking graph.  We will go into plot customisation in R in a future post, but, in this post, we explore some examples of customised plots.

Read More