Single-Cell Transcriptome Studies: A Powerful Way to Highlight Subtle Differences Between Cells That May Be Hidden in a Population
SMART-Seq v4 Ultra Low Input RNA Kit for Sequencing
One of the goals of good science is to explore cellular function and find even the smallest difference that can influence the biology of a system. In this study, we looked at the transcriptome of genetically identical individual cells and compared them to each other and to two different sized cell populations. Using highly sensitive template-switching (SMART; Switching Mechanism at 5' end of RNA Template) technology, it was possible to obtain high-quality RNA-seq data from individual cells. We found that the number of transcripts expressed and the overall transcriptome differed between individual cells, perhaps indicating variation in cellular status despite growth under identical conditions. This study reinforces the idea that SMART technology can robustly produce cDNA from single cells for meaningful transcriptome analysis.
Transcriptome analysis is an effective way to obtain information regarding the state of a cell or tissue under a wide variety of conditions, including development or differentiation, response to the environment or infection, or during disease (Heaton et al., 2014; Henley et al., 2013; Saliba et al., 2014). In general, RNA-seq methods require more RNA than found in a single cell, therefore, a population of cells must be used as input. RNA-seq data from a population, no matter how small, can average out or mask the minor, but potentially important, variations in individual cells (Kanter and Kalisky, 2015; Saliba et al., 2014), or may be biased by the expression pattern of a few cells (Bengtsson et al., 2005). Performing RNA-seq using individual cells reveals these differences and offers the opportunity to study and characterize diverse types of tissue and cells, such as induced pluripotent stem cells (iPS cells), circulating tumor cells, cells from solid tumor tissue, and embryonic tissue (Fort et al., 2015; Saliba et al., 2014).
Single-cell and ultra-low-input mRNA-seq have a number of technical challenges, including sample preparation and insensitive or irreproducible cDNA synthesis. In recent years, there has been tremendous progress in methods and technology for transcriptome profiling of single cells. First, sample preparation can be improved by using cells directly. In this case, highly abundant rRNA transcripts (which may account for over 90% of all RNA) can be excluded from cDNA synthesis by using an oligo(dT)-priming method. Second, sensitive and reproducible cDNA synthesis can be achieved using SMART technology.
SMART technology (Figure 1) is based on non-templated nucleotides that are added by an MMLV-based reverse transcriptase (RT) when it reaches the 5' end of the mRNA during cDNA synthesis. Template switching then occurs when a specially designed Template-Switching Oligo (TSO) bearing a complementary sequence to these non-templated nucleotides hybridizes to the first-strand cDNA. The RT switches from using the mRNA as a template to using the TSO for further cDNA synthesis. This ensures that the 5' end of the mRNA is captured and allows specific sequences to be added to each end of the cDNA for simper amplification and enrichment of full-length cDNA (Figure 1). The inherently sensitive SMART technology has been continuously improving over the past 20 years by optimizing the reaction conditions, the TSO, and the PCR polymerase. Recently, SMART technology has been further improved in the SMART-Seq v4 Ultra Low Input RNA Kit for Sequencing by incorporating locked nucleic acid (LNA) modifications to the TSO, as well as other optimizations.
In this initial single-cell study, we show that by using the SMART-Seq v4 Ultra Low Input RNA Kit for Sequencing, good quality mRNA-seq data can be obtained from individual cells. We compare these single-cell data with populations of either 100 or 1,000 cells, and show similarities in the quality of the data obtained, but differences in the number of transcripts identified and transcriptome profiles. Overall these results indicate that this kit is a robust tool for single-cell studies.
Figure 1. Cartoon of the template-switching mechanism. Non-templated nucleotides (indicated by Xs) added by the RT hybridize to the TSO (SMART-Seq v4 Oligonucleotide), which provides a new template for the RT. Chemical modifications to block ligation are present on some primers (indicated by black stars). The SMART adapters added by the oligo(dT) primer and TSO, and used for amplification during PCR, are indicated in green. LD PCR is Long Distance PCR (Barnes, 1994).
cDNA yield varies between individual cells.
Six individual HeLa cells (diluted from the same 10-cm dish and visually inspected to ensure a sample size of only one cell) and two replicates of either 100 or 1,000 cells were used as the input for the SMART-Seq v4 Ultra Low Input RNA Kit for Sequencing. The cDNA produced was run on an Agilent Bioanalyzer to determine the quality and yield (Figure 2). The shape of the cDNA profile was very similar for each individual cell, and similar to that of the two populations. cDNA ranged between 300–10,000 bp with a major peak at ~2,000 bp characteristic of the distribution of full-length mRNA in the cell. The additional peak at ~1 kb is characteristic of HeLa cells and is the result of a single highly expressed gene (FTH1).
While the quality of the cDNA was similar across individual cells, the yield was noticeably different. For example, Cell #4 had quite high cDNA yield, while Cells #3 and #6 had very poor yield. The diversity in cDNA yield may indicate different transcriptional states of the cells (for example, differences in the cell cycle).
Figure 2. Electropherograms of cDNA libraries generated from six individual cells (Panel A) or from the first replicate of either 100 or 1,000 cells (Panel B).
RNA-seq data show that transcriptional differences exist between single cells.
RNA-seq libraries from each individual cell and the two replicates from the 100-cell and 1,000-cell populations were generated and sequenced (Table I). The data quality from the lowest-yield cDNA library (from Cell #6) was too low to be informative and is not presented here. Overall, the quality of RNA-seq data was similar between the individual cells and the populations. The percentage of transcripts mapping to rRNA was very low and the percentage mapping to the genome was high across all libraries. The number of transcripts identified was also high for all libraries, though the number from individual cells was lower and more variable than that of the two populations. That variability seemed to correspond to the variability in the cDNA yield. Lower-yield libraries (for example, from Cell #3) tended to identify noticeably fewer transcripts than libraries with higher yield (from Cell #4). The consistent sequencing metrics indicate that the SMART-Seq v4 method is sensitive enough to produce high-quality data from individual cells, giving confidence that the differences in transcripts between cells are more likely to be biologically real and relevant.
As expected, significant overlap in the transcripts identified from single cells was observed, yet unique transcripts were found in each cell (Figure 3). Variation in the number of transcripts identified in single-cells libraries highlights that even cells grown together, under identical conditions, exhibit slightly different transcriptomes.
|Table I. Sequencing Data from Intact Cells|
|Input||1 cell||100 cells||1,000 cells|
|Number of reads (millions)||2.3 (paired-end)||2.0 (single-end)|
|Percentage of reads (%):|
|Mapped to genome||91||91||88||88||92||98||98||98||98|
|Mapped uniquely to genome||87||86||83||84||88||82||82||79||80|
|Transcripts with FPKM >0.1||8,888||8,532||7,302||10,967||10,856||13,417||13,219||13,348||13,324|
|Transcripts with FPKM >1||8,040||7,670||6,084||10,036||9,703||10,725||10,560||10,494||10,531|
Table I. Primary sequencing metrics from mRNA-seq libraries made from individual, 100, or 1,000 HeLa cells sequenced on an Illumina® MiSeq® platform.
Figure 3. Venn diagram showing the overlap in the transcripts represented in the sequencing libraries prepared from five individual cells.
Gene body coverage is even and consistent across RNA-seq libraries.
Transcripts were normalized for length, and coverage of all reads across the transcripts was plotted (Figure 4). All RNA-seq libraries showed remarkably uniform gene body coverage profiles. Full-length gene body coverage is important for identifying splice variants or when studying the 5' and 3' ends of mRNA.
Figure 4. Normalized gene body coverage of RNA-seq libraries generated from individual cells or from the two populations of cells.
Expression levels are consistent across individual cells.
Scatter plots were used to compare transcript expression levels between the individual cells and to compare the individual cells to the two different cell populations. In general, despite some differences between cells, transcripts that were identified in both of the compared libraries show similar expression levels, indicated by high Pearson and Spearman correlations (Figure 5). Interestingly, the highest correlation was between the two populations of cells, reinforcing the idea that RNA-seq data from populations are more similar than data from individual cells and may mask differences between cells.
Figure 5. Scatter plots comparing the expression level (FPKM) of transcripts found in libraries generated from individual cells, or 100- or 1,000-cell populations (only comparisons to the first replicate of these libraries are shown). Correlations are shown as Pearson/Spearman correlation (e.g., the comparison of Cell #1 and Cell #2 had a Pearson (R) correlation of 0.988 and a Spearman (ρ) correlation of 0.686).
What makes one cell different from another? Good science aims to understand this and other fundamental questions in biology. Differences between cells in a population may be due to a number of factors, including stochastic variability in gene expression or asynchrony in the cell cycle, or may indicate a cryptic sub-population. This variability in the transcriptome of individual cells might provide initial insights into gene function, development, or disease progression. To enable single-cell and ultra-low-input RNA-seq studies, we have created the sensitive and robust SMART-Seq v4 Ultra Low Input RNA Kit for Sequencing, a tool that produces biologically relevant transcriptome data from as little as one cell.
Cell culture and isolation
Non-confluent HeLa cells, grown in a 10-cm dish, were harvested, washed, and diluted in 1X PBS. Cells were counted and diluted to approximately 1 cell/µl. Prior to cDNA synthesis, cells were visually inspected to ensure that only a single cell was present in each sample. For population samples, a separate 10-cm dish was grown and cells were harvested, washed, counted, and diluted in 1X PBS to 1,000 cells/µl and then to 100 cells/µl.
cDNA synthesis and library preparation
cDNA synthesis was performed using the SMART-Seq v4 Ultra Low Input RNA Kit for Sequencing as described in the User Manual. For individual cells, one library was prepared from each cell, while the populations were prepared in duplicate. 17 PCR cycles were used for libraries made from individual cells, while 11 or 8 PCR cycles were used for 100- or 1,000-cell libraries, respectively. The cDNA was analyzed using an Agilent 2100 Bioanalyzer. Sequencing library preparation was performed using the Nextera® XT DNA Library Preparation Kit (Illumina).
The libraries were sequenced on an Illumina MiSeq instrument generating 2.3 million paired-end reads (2 x 75 bp) for the single-cell libraries and 2.0 million single-end reads (1 x 57 bp) for the 100- and 1,000-cell libraries. Reads were aligned with STAR against hg19 with Ensembl annotation. The percentages of reads that mapped to rRNA, exonic regions, intronic regions, and intergenic regions were defined by Picard analysis.
Barnes, W.M. (1994) Proc. Natl. Acad. Sci. U. S. A. 91(6):2216–2220.
Bengtsson, M., et al., (2005) Genome Res. 15(10):1388–1392.
Fort, A., et al., (2015) Cell Cycle 14(8):1148–1155.
Heaton, N.H., et al., (2014) J. Exp. Med. 211(9):1707–1714.
Henley, B., et al., (2013) Biochem. Pharmacol. 86(8):1074–1083.
Kanter, I., and Kalisky, T., (2015) Front Oncol. 5:53.
Saliba, A.-E., et al., (2014) Nucleic Acids Res. 42(14):8845–8860.