- Iyer, Matthew K;
- Niknafs, Yashar S;
- Malik, Rohit;
- Singhal, Udit;
- Sahu, Anirban;
- Hosono, Yasuyuki;
- Barrette, Terrence R;
- Prensner, John R;
- Evans, Joseph R;
- Zhao, Shuang;
- Poliakov, Anton;
- Cao, Xuhong;
- Dhanasekaran, Saravana M;
- Wu, Yi-Mi;
- Robinson, Dan R;
- Beer, David G;
- Feng, Felix Y;
- Iyer, Hariharan K;
- Chinnaiyan, Arul M
Long noncoding RNAs (lncRNAs) are emerging as important regulators of tissue physiology and disease processes including cancer. To delineate genome-wide lncRNA expression, we curated 7,256 RNA sequencing (RNA-seq) libraries from tumors, normal tissues and cell lines comprising over 43 Tb of sequence from 25 independent studies. We applied ab initio assembly methodology to this data set, yielding a consensus human transcriptome of 91,013 expressed genes. Over 68% (58,648) of genes were classified as lncRNAs, of which 79% were previously unannotated. About 1% (597) of the lncRNAs harbored ultraconserved elements, and 7% (3,900) overlapped disease-associated SNPs. To prioritize lineage-specific, disease-associated lncRNA expression, we employed non-parametric differential expression testing and nominated 7,942 lineage- or cancer-associated lncRNA genes. The lncRNA landscape characterized here may shed light on normal biology and cancer pathogenesis and may be valuable for future biomarker development.