In RNA-Seq analyses, adding pre-determined quantity of synthetic RNA sequences (spike-ins) to samples is a popular way to verify the experimental pipeline, determine quantification accuracy and for normalisation of differential expression. The most commonly used spike-ins are the ERCC spike-ins.
This post will cover the bioinformatic steps involved in obtaining read counts of spike-ins from a FASTQ file sequenced with spike-ins. The steps are namely creating a custom FASTA genome build incorporating the spike-in sequences, custom GTF file creation, mapping the reads to the custom genome, read counting and visualisation. This post will not be covering the wet lab part of adding spike-ins. I have a FASTQ data file (sample01.fq.gz) from single cell 50bp single-end Illumina reads with spike-ins that I am using for this workflow.