A Complete Kit for Stranded RNA-Seq Library Preparation
SMARTer Stranded Total RNA Sample Prep Kit - Low Input Mammalian
- Stranded information:
Accurate identification of transcript strand of origin
- Ribosomal RNA depletion:
Efficient rRNA removal in a complete sample preparation kit
- Integrated library preparation:
High-quality sequencing data generated on Illumina® platforms
RNA sequencing (RNA-seq) is a key tool for expression analysis of the entire transcriptome, with high sensitivity and a wide dynamic range. Random-primed cDNA synthesis kits, like the SMARTer Stranded RNA-seq kits, are ideal for transcriptome analysis from all types of input RNA, including compromised samples. These kits are based on SMART (Switching Mechanism at 5' End of RNA Template) technology, which is an inherently strand-specific reverse transcription reaction leading to ≥99% accurate identification of the strand of origin without the need for additional preparation steps. Illumina adapters (up to 96 different indexes) are added during cDNA amplification eliminating further library preparation steps after cDNA synthesis.
Prior to cDNA synthesis with any random-primed cDNA synthesis kit, it is important to remove ribosomal RNA (rRNA), which can represent up to 90% of total RNA. The RiboGone - Mammalian kit uses hybridization technology and RNase H digestion to bind and specifically deplete 5S, 5.8S, 18S, and 28S nuclear rRNA sequences and 12S mitochondrial RNA (mtRNA) sequences from full-length or sheared total RNA derived from human, mouse, or rat samples. (This kit does not deplete 16S mitochondrial RNA sequences, which share significant homology with some nuclear genes.) The SMARTer Stranded Total RNA Sample Prep Kit - Low Input Mammalian combines these two technologies into one convenient kit for complete sample preparation and cDNA synthesis.
Maintaining strand of origin information in cDNA libraries for sequencing allows researchers to identify overlapping transcripts, which are common in compact bacterial genomes, and antisense transcripts that will be lost with a strand-agnostic cDNA synthesis method. Using the SMARTer Stranded RNA-seq kits we have been able to correctly identify both overlapping and antisense transcripts.
Distinguishing overlapping and antisense transcripts with the SMARTer Stranded RNA-seq Kit. (Panel A) RNA-seq reads from a Human Brain Poly A+ RNA cDNA library were mapped against the human genome. The SMARTer Stranded method allowed assignment of sequencing reads to the correct gene in the case of overlapping PHC1 and M6PR transcripts. (Panel B) Strand-specific coverage of the CDR1 locus. Nearly all reads are antisense to the annotated transcript, a finding independently reported elsewhere (1). (Panel C) Comparison of CDR1 gene counts obtained using either a strand-agnostic or strand-aware method.
Both RiboGone-treated and oligo(dT)-purified RNA sample inputs generated RNA-seq data with similarly low percent sequencing reads mapping to rRNA. Both intact (Human Brain Total RNA) and degraded (FFPE tissue) RNA samples are suitable for the SMARTer Stranded Total RNA Sample Prep Kit - Low Input Mammalian. Oligo(dT)-based methods for decreasing the number of rRNA reads also decrease the number of sequencing reads mapping to non-coding RNAs. The RiboGone method is based on selective hybridization to rRNA leaving both mRNA and non-coding RNAs available as templates for the reverse transcription reaction.
Efficient rRNA removal with the RiboGone - Mammalian kit. RNA-seq libraries were generated from Human Brain Total RNA or Breast Cancer FFPE RNA using the SMARTer Stranded RNA-seq Kit. Libraries generated from RiboGone-treated RNA had comparably low rRNA reads to oligo(dT)-enriched RNA while retaining more non-coding reads.
The SMARTer Stranded Total RNA Sample Prep Kit - Low Input Mammalian maintains the ability of other SMARTer Stranded kits to generate RNA-seq libraries from a variety of samples including Human Universal Reference RNA (HURR; Agilent) and Human Brain Reference RNA (HBRR; Ambion). When libraries were sequenced on an Illumina MiSeq® instrument, both the HURR and HBRR samples yielded a high number of reads, with 75–76% mapped, 66–70% uniquely mapped, over 13,800 genes identified, and less than 1% of reads mapped to rRNA.
|Sequence Alignment Metrics|
|Human Universal Reference RNA (HURR)||Human Brain Reference RNA (HBRR)|
|No. of reads||6,829,540||7,728,850|
|Mapped to rRNA||62,792||(0.9%)||49,844||(0.7%)|
|Mapped to mitochondrial RNA||318,006||(4.7%)||224,939||(2.9%)|
|Mapped to RefSeq||4,871,900||(76%)||5,515,264||(75%)|
|Mapped uniquely to RefSeq||4,435,123||(70%)||4,888,340||(66%)|
Sequencing alignment metrics for HURR and HBRR libraries. 10 ng samples of intact HURR and HBRR were used as input for the SMARTer Stranded Total RNA Sample Prep Kit - Low Input Mammalian. RNA-seq libraries were prepared according to the kit protocol and sequenced on an Illumina MiSeq platform.
RNA-seq data obtained with the SMARTer Stranded Total RNA Sample Prep Kit - Low Input Mammalian for HURR and HBRR samples correlate with qPCR data for the same RNAs obtained through the MicroArray Quality Control (MAQC) study (2). The high level of correlation with MAQC (R = 0.860) suggests that the RNA-seq data was not affected by rRNA depletion with the RiboGone - Mammalian kit.
High correlation between SMARTer Stranded RNA-seq data and MAQC qPCR data. A scatter plot was used to compare differential expression data obtained from SMARTer transcriptome analysis of HURR and HBRR cDNA libraries (in Reads per Kilobase of Exon per Million Reads; RPKM) and qPCR data for HURR and HBRR (in Ct) from the MAQC project. The transcripts used in this analysis were the 623 of ~900 transcripts present in the MAQC data set that were also detected in both the HURR and HBRR SMARTer Stranded RNA-seq data sets.
The SMARTer Stranded RNA-seq Kits generate RNA-seq libraries from intact or degraded RNA samples that retain the strand of origin information. Strand of origin information can be used to identify overlapping and antisense transcripts. Sequencing data generated with these kits identify a large number of genes that highly correlate with MAQC data. The SMARTer Stranded Total RNA Sample Prep Kit - Low Input Mammalian combines the efficient rRNA-removal of RiboGone - Mammalian with cDNA synthesis. The data generated with this complete kit maintains the high quality of other SMARTer Stranded RNA-seq Kits.
Human Brain Poly A+ RNA was spiked with ERCC control RNA and serially diluted to prepare RNA samples containing between 100 pg–100 ng RNA. cDNA libraries were prepared using the SMARTer Stranded RNA-seq Kit according to the kit protocol with twelve different Illumina indices. Libraries were sequenced on an Illumina HiSeq® 2000 instrument, with ~300M 2 x 100 bp paired end reads.
RNA was generated from Breast Cancer FFPE RNA (Cureline) using a NucleoSpin totalRNA FFPE kit. This RNA and Human Brain Total RNA was treated with either the RiboGone - Mammalian kit or the Magnosphere UltraPure mRNA Purification Kit, according to the specific kit protocol. Untreated total RNA was also used as input for RNA-seq library production in order to identify the high percent of rRNA reads present in the initial total RNA preparations. RNA-seq libraries were generated with the SMARTer Stranded RNA-seq Kit and sequenced on an Illumina MiSeq instrument. Reads were mapped to the hg19 genome and read distributions were determined using Picard RNA-seq Metrics.
RNA-seq libraries were generated from 10 ng samples of Human Universal Reference RNA (Agilent) and Human Brain Reference RNA (Ambion), the same RNAs used in the MAQC project (2), using the SMARTer Stranded Total RNA Sample Prep Kit - Low Input Mammalian according to the kit protocol, using 18 cycles of PCR. Libraries were sequenced on an Illumina MiSeq platform with ~7M 1 x 50 bp single end reads per library.
Reads were trimmed by CLC Genomics Workbench and mapped to rRNA and the mitochondrial genome with CLC (% reads indicated). The unmapped reads were subsequently mapped with CLC to the human genome with RefSeq masking, producing mapped reads and uniquely mapped reads. The number of genes identified in each library was determined by the number of genes with an RPKM of at least 0.1. The number of reads that map to introns or exons is a percentage of the reads successfully mapped to RefSeq.
- Hansen, T. B. et al. (2011) EMBO J. 30(21):4414–4422.
- MAQC Consortium (2006) Nat. Biotechnol. 24(98):1151–1161.