STRUCTURE is a popular software used by biologists to infer the population structure of organisms using genetic markers. Barplots in STRUCTURE have an option to sort individuals by Q. We are going to figure out what this means and how it is done.
I currently use STRUCTURE 2.3.4 on Windows. A typical assignment output file for K=2 looks like below.

Fig 1: A typical Structure assignment output.
We can use the plotting functionality within STRUCTURE to view the assignment results as a barplot. See fig below. The individuals are ordered in the same order as the input file when selecting the ‘Original order’ option. There is another option to sort individuals called ‘Sort by Q’. What does this actually do?

FIg 2: Barplot in STRUCTURE software showing original order of individuals (top) and ‘Sort by Q’ order (bottom).
One might reasonably assume that the individuals are sorted by one of the assignment clusters. But, that is not the case. We will try to plot the data manually and investigate this option. The structure output file used can be downloaded here.
We use the R package pophelper
to convert structure files to R dataframe, ggplot
package for plotting and reshape2
package for data restructuring. The data is read into R as a dataframe with two columns Cluster1 and Cluster2 with assignment values.
#install pophelper library #library(devtools) #install_github('royfrancis/pophelper') #load packages library(ggplot2) library(reshape2) library(pophelper) #read data to dataframe df <- runsToDfStructure("structure-file.txt") head(df)
> head(df) Cluster1 Cluster2 1 0.965 0.035 2 0.977 0.023 3 0.961 0.039 4 0.975 0.025 5 0.974 0.026 6 0.982 0.018
Now we create a function to create the plot.
#create function to generate plots plotfn <- function(df=NULL,filename=NULL) { #reshape to long format df$num <- 1:nrow(df) df1 <- reshape2::melt(df,id.vars = "num") #reversing order for cosmetic reasons df1 <- df1[rev(1:nrow(df1)),] #plot p <- ggplot(df1,aes(x=num,y=value,fill=variable))+ geom_bar(stat="identity",position="fill",width = 1, space = 0)+ scale_x_continuous(expand = c(0, 0))+ scale_y_continuous(expand = c(0, 0))+ labs(x = NULL, y = NULL)+ theme_grey(base_size=7)+ theme(legend.position = "none", axis.ticks = element_blank(), axis.text.x = element_blank()) ggsave(filename=filename,plot = p,height=4,width=12,dpi=150,units = "cm") } #plot unsorted plot plotfn(df=df,filename="bp-r-unsorted.png")
Here is the assignment barplot in the original order.

Fig 3: Assignment barplot recreated in R. Individuals are in original order.
Now we create two plots. One figure where the table is sorted by Cluster1 and second figure where the table is sorted by Cluster2.
#sort table by cluster1 df_c1 <- df[order(df[,1]),] plotfn(df=df_c1,filename="bp-r-sortedc1.png") #sort table by cluster2 df_c2 <- df[order(df[,2]),] plotfn(df=df_c2,filename="bp-r-sortedc2.png")

Fig 4: Assignment barplot in R sorted by cluster1.

Fig 5: Assignment barplot in R sorted by cluster2.
Both of these plots do not resemble the ‘Sort by Q’ option in Structure software. They look like mirror images only because it’s K=2. For K>2, they would look quite different. Anyway, the ‘Sort by Q’ option does a bit more. For each individual, the max assignment value is picked to create a new column called ‘max’. The cluster number with the max assignment is created as a new column called ‘match’. The the whole table is sorted ascending by ‘match’ and descending by ‘max’. Here is the R code.
#pick max cluster, match max to cluster maxval <- apply(df,1,max) matchval <- vector(length=nrow(df)) for(j in 1:nrow(df)) matchval[j] <- match(maxval[j],df[j,]) #add max and match to df df_q <- df df_q$maxval <- maxval df_q$matchval <- matchval #order dataframe ascending match and decending max df_q <- df_q[with(df_q, order(matchval,-maxval)), ] #remove max and match df_q$maxval <- NULL df_q$matchval <- NULL #plot plotfn(df=df_q,filename="bp-r-sortedq.png")
And that gives us the plot we are looking for. The same plot created in the STRUCTURE software.

Fig 6: Assignment barplot in R sorted by Q.
Here is also an Excel file with the calculations, if R is not your thing.

Fig 7: Assignment barplot and ‘Sort by Q’ calculation in Excel.
You can always verify by checking the individual number (#) with the individual numbers in the STRUCTURE software (set to ‘Plot in multiple lines’).
That’s all for now. I hope this was useful for all those who were as confused as I was.
About the author Roy Francis
A guide to elegant tiled heatmaps in R [2019]
Beautiful circos plots in R
Fast LD computation from VCF files using Tomahawk