Recent advances in DNA and RNA sequencing have transformed the field of genomics, making it possible to generate large amounts of data rapidly and at low cost. Now, EU-funded researchers have developed the statistical tools needed for the analysis of hundreds of gigabytes of data produced in each single sequencing run.
Progress made since the first human genome sequence to the nascent era
of genomic medicine has been made possible thanks to high-throughput
sequencing (HTS). This technology allows rapid sequencing of large
stretches of DNA and RNA base pairs spanning entire genomes. However, to
extract meaningful biological signals, HTS requires powerful and
computationally efficient statistical tools.
The EU-funded project
RADIANT (Rapid development and distribution of statistical tools for high-throughput sequencing data) was launched to improve the most popular data analysis tools. Its ultimate objective was to integrate software packages developed by researchers in France, Germany, Italy, Switzerland and the United Kingdom into a single computational framework.
Among these is the Python library HTSeq that pre-processes RNA sequencing data for differential expression genes' analysis. The package DESeq2 provides methods to detect differentially expressed genes using generalised linear models. On the other hand, the BitSeqVB package implements a Bayesian approach to inferring the concentration of messenger RNA transcripts.
Research within the RADIANT project covered all aspects of HTS data analysis, from quality control to data visualisation. For gene expression time series, a hierarchical Bayesian modelling was proposed that can impute data missing both systematically and randomly. The RADIANT genome browser is the first visualisation tool to be developed for DNA methylation data.
Most of the tools have been included in
Bioconductor, providing a uniform framework for HTS data analysis, documentation and distribution. The huge number of packages available on Bioconductor, however, makes it difficult for inexperienced users to solve specific problems. A 'Beginner's vignette' was therefore authored to provide a simple but comprehensive introduction to RNA sequencing data analysis.
Thanks to their ability to provide limitless insight into the human genome, sequencing technologies have permeated virtually all branches of biological and medical research. With RADIANT's newly developed tools, HTS data analysis will become firmly entrenched as an indispensable tool. The applications envisaged can transform genomic studies, unlocking information never before imaginable.