Journal article
How data analysis affects power, reproducibility and biological insight of RNA-seq studies in complex datasets
Nucleic acids research, Vol.43(16), pp.7664-7674
09/18/2015
Handle:
https://hdl.handle.net/2376/116890
PMCID: PMC4652761
PMID: 26202970
Abstract
The sequencing of the full transcriptome (RNA-seq) has become the preferred choice for the measurement of genome-wide gene expression. Despite its widespread use, challenges remain in RNA-seq data analysis. One often-overlooked aspect is normalization. Despite the fact that a variety of factors or 'batch effects' can contribute unwanted variation to the data, commonly used RNA-seq normalization methods only correct for sequencing depth. The study of gene expression is particularly problematic when it is influenced simultaneously by a variety of biological factors in addition to the one of interest. Using examples from experimental neuroscience, we show that batch effects can dominate the signal of interest; and that the choice of normalization method affects the power and reproducibility of the results. While commonly used global normalization methods are not able to adequately normalize the data, more recently developed RNA-seq normalization can. We focus on one particular method, RUVSeq and show that it is able to increase power and biological insight of the results. Finally, we provide a tutorial outlining the implementation of RUVSeq normalization that is applicable to a broad range of studies as well as meta-analysis of publicly available data.
Metrics
11 Record Views
Details
- Title
- How data analysis affects power, reproducibility and biological insight of RNA-seq studies in complex datasets
- Creators
- Lucia Peixoto - Department of Biology, University of Pennsylvania, Smilow Center for Translational Research, Room 10-170, Building 421, 3400 Civic Center Boulevard, Philadelphia, PA 19104-6168, USADavide Risso - Division of Biostatistics, School of Public Health, University of California, Berkeley, 344 Li Ka Shing Center, #3370, Berkeley, CA 94720-3370, USAShane G Poplawski - Department of Biology, University of Pennsylvania, Smilow Center for Translational Research, Room 10-170, Building 421, 3400 Civic Center Boulevard, Philadelphia, PA 19104-6168, USAMathieu E Wimmer - Department of Biology, University of Pennsylvania, Smilow Center for Translational Research, Room 10-170, Building 421, 3400 Civic Center Boulevard, Philadelphia, PA 19104-6168, USATerence P Speed - Department of Statistics, University of California, Berkeley, Department of Mathematics and Statistics, The University of Melbourne, Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, AustraliaMarcelo A Wood - University of California, Irvine, Department of Neurobiology and Behavior, USATed Abel - Department of Biology, University of Pennsylvania, Smilow Center for Translational Research, Room 10-170, Building 421, 3400 Civic Center Boulevard, Philadelphia, PA 19104-6168, USA abele@sas.upenn.edu
- Publication Details
- Nucleic acids research, Vol.43(16), pp.7664-7674
- Academic Unit
- Biomedical Sciences, Department of
- Publisher
- England
- Grant note
- R01MH087463 / NIMH NIH HHS T32HL007953 / NHLBI NIH HHS R01 MH087463 / NIMH NIH HHS R01 DA036984 / NIDA NIH HHS T32 GM008076 / NIGMS NIH HHS DA036984 / NIDA NIH HHS T32NS007413 / NINDS NIH HHS R01 MH101491 / NIMH NIH HHS T32 HL007953 / NHLBI NIH HHS T32 NS007413 / NINDS NIH HHS R01MH101491 / NIMH NIH HHS
- Identifiers
- 99900547458801842
- Language
- English
- Resource Type
- Journal article