If you are an Instagram user, at some point, you care going to be interested in the various metrics such as followers, number of posts by a certain user etc. You might want to compare these metrics between different users or to find out the number of posts with a certain hashtag etc. The casual way to do it is to go the relevant Instagram page and look at the metric and write it down somewhere, and go to next and so on. Clearly this is not ideal strategy if you want to look at a few hundred pages. It would be neat to get this data in an automated manner.
STRUCTURE is a popular software used by biologists to infer the population structure of organisms using genetic markers. Barplots in STRUCTURE have an option to sort individuals by Q. We are going to figure out what this means and how it is done.
This was inspired by the disease incidence rate in the US featured on the Wall Street Journal which I mentioned in one of the previous posts. The disease incidence dataset was originally used in this article in the New England Journal of Medicine. Here, I use the measles level 1 incidence (cases per 100,000 people) dataset obtained as a .csv file from Project Tycho. Download the .csv file here or head over to Project Tycho for other datasets.
In this post, we will look into creating a neat, clean and elegant heatmap in R. No clustering, no dendrograms, no trace lines, no bullshit. We will go through some basic data cleanup, reformatting and finally plotting. We go through this step by step. For the whole code with minimal explanations, scroll to the bottom of the page.
Scientific graphs are key in science to presenting usually complex data in an attractive and concise manner. Scientific graphs are supposed to be a visually summary of your data.
Data collection > Data analysis > Data visualisation
The most important part of a plot is of course its content. Once you have good content/data, you need to think about how to represent this data. Which sort of plot to use. How to best convey this information. See this article for most common types of data visualisation. Some of the popular programming environments for plotting include
Julia being the latest addition. Other options include Excel, Tableau, Plotly and more.
R is a great tool for graphics as evident from the numerous images, blogs and publications over the past years. There are several resources that can help you find code to create all sorts of graphs in
But, just managing to create a graph is not sufficient in my opinion. The graphic has to be beautiful, elegant, user-friendly and attractive. Getting from data to a plot is one thing, but creating a high-quality, publishable and professional looking figure is a different story. A Basic plot is the initial basic output figure from any plotting software. This uses the default setting and default looks. Most people stop at this point because they have gone through considerable effort to get the data, analyse it and finally figure out the code to plot it. But a basic plot is usually not going to look professional or elegant. It will need some level of customisation to make it attractive. The
ggplot2 plotting package in
R, for example, produces a pretty decent default output, but they are overused and the graphics are not unique or catchy. I haven’t come across many sources that go into the fine details of the making a professional looking graph. We will go into plot customisation in
R in a future post, but, in this post, we explore some examples of customised plots.