APP下载

A survey on normalization of single-cell RNA sequencing

2019-08-14HanHenryLIUWenbin

Han Henry, LIU Wen-bin

(1.Department of Computer and Information Science, Fordham University, New York 10023, USA; 2.Institute of Computational Science and Technology, Guangzhou University, Guangzhou 510006, China)

Abstract: Normalization is an essential component for single-cell RNA-seq (scRNA-seq) analysis and determines the the quality of downstream analysis. However, scRNA-seq normalization remains a challenge compared to bulk RNA-seq for its zero inflation. In this study, authors review the existing normalization methods for scRNA-seq data under a framework of bias analysis. The framework categorizes different biases in scRNA-seq and provides a solid theoretical foundation for scRNA-seq normalization. This study also compares well-established bulk RNA-seq normalization procedures and tailored scRNA-seq normalization methods on benchmark scRNA-seq data on behalf of phenotype clustering.

Key words: bulk RNA-seq; scRNA-seq; normalization; bias analysis; dropout; zero inflation

0 Introduction

Normalization plays an essential role in the analysis of RNA-seq data that provides a key in understanding transcription mechanism and cell functionality[1-2]. Since different artifacts from experimental design and high-throughput technology itself bring different biases (e.g., sequencing depth bias) in RNA-seq quantification process, the raw RNA-seq data, which is a read count matrix obtained from aligning sequencing reads to a reference genome or counting transcripts by using unique molecular identifiers (UMIs), can not truly reflect gene expression[3-4]. Thus, it is essential to conduct normalization to remove the biases so that true gene expression can be accurately measured and RNA-seq samples are comparable before differential expression (D.E.) analysis or other downstream analysis. In fact, almost all downstream RNA-seq data analytics are built on different normalization procedures, though some recent research reports that data normalization may not be an indispensable step for RNA-seq disease diagnosis[5].

RNA-seq data can be technically categorized as bulk RNA-seq and single cell RNA-seq (scRNA-seq) data according to different sequencing protocols and goals. Bulk RNA-seq aims to assess transcriptional changes due to different experimental conditions. It assumes that cells from a given tissue are homogeneous, and measures average gene expression levels from many cells to gain transcription quantification. On the other hand, scRNA-seq makes it possible to investigate transcriptomic landscapes at the single-cell resolution. It aims at finding the “high-resolution differences” between cell types/sub-types, in which cell-to-cell variability is particularly emphasized[4]. For example, it can find new rare or hidden cell subtypes and capture those differentially expressed genes across distinct cell populations[6-8]. However, scRNA-seq data can be more complicate than bulk RNA-seq data on behalf of data analytics for its own special data generation mechanism[9-10]. For example, scRNA-seq data no longer follows a single distribution model (e.g., negative binomial distribution), and, shows different distributions according to different types of populations[9,11-13].

There are quite a lot of normalization methods developed for bulk RNA-seq data, some of which are widely used in scRNA-seq normalization (e.g., TMM), to remove or alleviate different biases from RNA-seq sequencing[14-18]. The biases can be categorized as in-sample and out-sample biases respectively. The in-sample bias takes place to a portion of special genes, which are long or active genes, within a sample in sequencing. It can be further divided as gene length and frequency bias. The gene length bias refers to that long genes have more counts in RNA-sequencing than short genes. The gene frequency bias refers to that the genes that are highly expressed may shadow the true expressions of the other genes (e.g., lowly-expressed genes).

Alternatively, the out-sample bias includes the sequencing depth bias and others that take place across samples. For example, the sequencing depth bias means that those genes from a sample with a high sequencing depth, usually have higher expression levels than the same genes with normal or low sequencing depth from another sample. More details about the in-sample and out-sample biases can be found in the bias analysis section in this study. Since each sample is just a cell for scRNA-seq data, we use cell and sample interchangeably in this study.

Different normalization methods have different targets in their bias removal. For examples, RPKM aims at removing the biases from gene length and sequencing depth by adjusting read counts per million reads as well as gene length per killable. FPKM (Fragments Per Kilo-base Mil-lion), RPM (Reads Per Kilobase Million) and TPM (Tran-scripts Per Kilobase Million) follows a similar mechanism. On the other hand, TMM and DESeq focus on correcting the gene frequency, gene length and depth biases by scaling each gene expression via a scaling factor calculated from a specifically selected set of genes or a pseudo-reference sample that is the geometric mean of all genes[14-17].

Almost all the methods conduct normalization by calculating a scaling factor to normalize each sample. Such an approach assumes that most genes are not differentially expressed. The scaling factor calculation may need the involvement of the sample or whole samples in the dataset according to different methods. However, there is no a standardized scaling factor calculation approach, though TMM and DESeq are reported as the most effective methods for bulk RNA-seq normalization[14]. Alternatively, there are few model-based normalization methods to conduct normalization by exploiting statistical models (e.g., factor analysis)[19-20]. The methods generally have high implementation complexities and not widely used as the scaling-factor calculation methods. Thus, we mainly focus on the scaling factor calculation normalization methods for its popularity in this study.

Normalization of scRNA-seq data still widely employs normalization methods (e.g., DESeq) developed for bulk RNA-seq though some particular routines have been designed for scRNA-seq[2,4]. However, it may not be very suitable because scRNA-seq data has its own special characteristics compared with bulk RNA-seq data.

scRNA-seq data has a zero inflation characteristic. It refers to that a scRNA-seq dataset has a large number of zeros or near-zero values[2,4,21]. Although a group of cells may biologically demonstrate zero expression for specific genes under possible states, it is not a major reason for the zero inflation for its low likelihood. Alternatively, it is rooted in the dropout issue that causes many transcripts to go undetected and induce an excess of zero read counts. The zeros caused by dropout are expressed but not measured “technical zeros”.

The dropout occurs when there are low amounts of mRNA sequenced within individual cells (e.g., shallow-depth sequencing). It is highly likely that the moderate expression levels of a gene are observed in one cell, but undetectable in another cell due to the dropout. The dropout issue obviously creates an out-sample bias among all genes across samples. Thus, we name it as a dropout bias accord-ingly.

The dropout bias presents a hurdle in scRNA-seq nor-malization and following differential expression (D.E.) analysis[10,22]. As such, simply applying a normalization procedure to scRNA-seq data will be risky, without ’recov-ering the original transcript expressions shadowed by the bias. For example, the dropout bias can fail DESeq normalization by producing a pseudo-reference sample with a large number of zeros. Furthermore, scRNA-seq data can demonstrate high variability between technical replicates besides heterogeneous cells under a same level sequencing depth. It is generally not easy to distinguish the true biological variability and technical variability from noise[23]. Thus, scRNA-seq data normalization should address the new bias besides original ones, instead of only applying normalization methods designed for bulk RNA-seq data.

However, the normalization of scRNA-seq is not a well studied topic compared with that of bulk RNA-seq, though single cell transcriptomics is becoming an important field in bioinformatics and medicine[2,24-25]. There are relatively a few normalization methods specially designed for scRNA-seq data and their functionality and relationships are not rigorously studied[10,22]. Moreover, the impacts of bulk RNA-seq normalization methods on scRNA-seq are not fully investigated though some initial efforts have been done recently[22]. On the other hand, different normalization methods can affect scRNA-seq data downstream analysis to a large degree. Thus, there is an urgent need to have a comprehensive review paper on this important topic to provide guide for scRNA-seq community.

The goal of this study is to provide a comprehensive but concise survey for scRNA-seq normalization. Different from other studies, we review normalization procedures used in scRNA-seq by following a framework of bias analysis proposed in this study. Since all read counts have to be adjusted in normalization according to all possible biases in sequencing, it is necessary to categorize the biases in scRNA-seq via a framework to shed more light on the important field. The framework of bias analysis can provide more targeted and accurate classification for different normalization methods.

The paper is structured as follows. Section 2 presents the framework of bias analysis in scRNA-seq normalization. Section 3 reviews the bulk RNA-seq normalization methods used in scRNA-seq data by following the framework. Section 4 covers the state-of-the-art normalization methods specifically designed for scRNA-seq data. Section 5 compares well-established bulk RNA-seq normalization procedures and tailored scRNA-seq normalization methods on benchmark scRNA-seq data on behalf of phenotype clustering. Finally we discuss possible problems to be solved in scRNA-seq normalization and conclude this study.

1 The framework of bias analysis

The goal of normalization in scRNA-seq is to make samples comparable with each other before downstream analysis (e.g.,clustering analysis) by mitigating or removing all kinds of biases in scRNA-seq sequencing. Some biases (e.g., depth bias) have been mentioned in RNA-seq analysis and normalization procedures in previous literature[17,21]. However, there is no study available to address, categorize, and analyze all possible biases comprehensively. While we have mentioned some biases in previous section, we present a framework of bias analysis for scRNA-seq normalization to decipher the sources of experimental noise in scRNA-seq for the sake of normalization. It is noted that the biases can take place in a RNA-seq experiement simultanously unless some special protocols are taken in sequencing.

1.1 In-sample and out-sample biases

There are generally two types of biases in scRNA-seq: in-sample and out-sample biases. The in-sample biases refer to those biases that happen to be a subset of “special genes” for each sample in sequencing, because of their special characteristics (e.g., long gene length or very active status). The gene length and gene frequency biases both belong to this type.

The out-sample biases refer to the biases that happen to be all genes of each sample in sequencing and make gene expression levels away from their true levels. At the same time, a large variability is reflected between samples, even in the absence of biological sample heterogeneity. The dropout and depth biases both belong to it besides a mixed bias from some unknown “batch effects” or noise[7,9]. Traditional statistical models focus on separating meaningful biological variations reflected in scRNA-seq data from those biases, but ignore its exact source.

The biases have different importance rankings for bulk RNA-seq and scRNA-seq. The dropout and depth biases seem to count more important technical variations than the gene length and gene frequency biases for scRNA-seq data. However, the gene length, frequency, and depth biases seem to be equally important for bulk RNA-seq data.

1.1.1 Gene length bias

RNA-seq is essentially a sampling process by sequencing transcriptome. It “prefers” long genes for its short read generation mechanism[14, 17]. Since long genes will have a large number of short reads than short genes, the number of reads does not fairly reflect the expression of each gene. If not adjusted, short genes would not have a chance to be differentially expressed genes[15-16].

1.1.2 Gene frequency bias

The gene frequency bias refers to those active genes would have more counts than the “inactive genes” or “less frequent genes”. The active genes or “more frequent” genes appear to be the highly-expressed genes in RNA-seq and will be more likely to appear to be differentially expressed genes because of their frequency[16,25]. Similar to the long genes, the active genes shadow the contributions of the inactive ones in gene expression levels and make the following downstream study (e.g., clustering) inaccurate or incorrect.

1.1.3 Depth bias

The depth bias refers to the higher the sequencing depth, the larger the gene count. Given a same gene, it can have quite different read counts across different samples because of their different sequencing depth[14, 17, 25]. It is due to that different library design protocols and it is not easy to know exactly how many short reads should be generated for a sample in sequencing.

While protocols theoretically using unique molecular identifiers (UMIs) can “overcome” depth bias somewhat, their data still suffers from depth bias to a different degree because it is almost impossible to guarantee that each unique molecular is observed as least once in sequencing[4].

1.1.4 Dropout bias

The dropout bias refers to quite a lot of genes that show zero or near-zero counts in scRNA-seq because of the stochastic nature of gene expression and the low amount of RNA in sequencing (e.g., shallow sequencing)[23-24]. The zeros generated by the dropout bias are expressed but not measured ‘technical zeros’ in scRNA-seq. Aside from the the stochastic issue of gene expression, dropout bias is actually a special depth bias when there are only a limited RNA short reads generated under a shadow sequencing, which is employed in scRNA-seq to discover the heterogeneity between cells[7,26].

1.1.5 Mixed bias

The mixed bias refers to some unknown “batch effects” or unexpected noise, which are from associated with high-throughput profiling system, that cause scRNA-seq data to demonstrate very high heterogeneity, even except the biases discussed[19,27]. For example, unobserved contaminated cells in sequencing or new biases brought by imputation approaches to fix dropout both can produce such a mixed bias[11,24]. As a result, a gene can show very different expression levels across different samples (cells) even under a same treatment. The mixed bias are more likely to belong to the out-sample biases. Since the exact source of the bias remains unclear except for an exact statistical modeling, we mainly focus on the four biases in this study for their generality[8-9].

2 Bulk RNA-seq methods for scRNA-seq normalization

While scRNA-seq data demonstrates more complexity than bulk RNA-seq data, there are still some normalization methods inherited from bulk RNA-seq data and widely used in scRNA-seq data normalization. We review following normalization methods on behalf of the framework of bias analysis.

2.1 RPKM, FPKM, RPM, and TPM

However, it does bring some other bias in per-gene variance, especially for lowly expressed genes, because low-expressed genes have low counts and sequencing depth can different sample by sample. It is not appropriate for scRNA-seq data normalization, because there can be a large number of very “lowly expressed genes” because of the dropout bias. FPKM (fragments per kilo base per million mapped reads) is an improved version of RPKM but not using read counts directly[28-29]. Instead, it approximates the relative abundance of transcripts in terms of fragments observed from an RNA-seq experiment.

RPM (reads per million mapped reads) is a purely depth-bias overcome approach[30]. It adjusts depth bias by normalizing all read counts with respect to the ratio between library size and a million number.

TPM (transcript per million), similarly to RPKM, aims at adjusting depth and gene length biases in normalization. Unlike RPKM, the read count per gene is adjusted by its gene length per kilobase before adjusting depth bias per million reads in normalization[2,30-31].

The RPKM variant methods lack dropout and frequency bias conquering mechanism for scRNA-seq data. Their depth bias conquering can not guarantee correct adjustments for lowly expressed genes. Thus, it can be hard for them to handle scRNA-seq normalization well even under the help of efficient imputation procesures.

2.2 TC and UQ

TC (total counts) and UQ (upper quartile) normalization both aim at correcting depth bias by using different library size related factors to adjust gene expression. TC scales gene read counts by the total number of reads per sample or an adjusted library size factor. UQ scales gene read counts by the ratio between theQ3value of the gene expressions of the sample and the average of all samples’Q3values.

UQ can be a risky normalization approach for a scRNA-seq dataset with a large amount of zeros and produces a zero or tiny scaling factor because the 75thpercentile of read counts can be zero or very tiny. On the other hand, TC can be problematic for scRNA-seq data because the total count number does not reflect the true library size in sequencing. It is highly likely that TC normalized data will bias the downstream analysis for the wrong scaling factor.

2.3 DESeq

DESeq normalizes each sample by calculating a pseudo-reference sample, which acts as a “virtual reference” to correct those “over-expressions” from the gene length and frequency biases[14-15].

Given a RNA-seq datasetX∈n×mwithngenes andmsamples, its pseudo-reference sample is calculated as a vector of gene geometric means:P=[p1,p2, …,pn]T, whereis the geometric mean of theithgene,i=1, 2…n.

It is noted that such a scaling factor can also adjust the depth bias such that the normalized gene expression will no longer strongly rely on the sequencing depth of each sample. Since DESeq does not have an obvious dropout bias handling mechanism, it is possible that the pseudo-reference sample can have a large amount of zeros and lead to a biased scaling factor.

2.4 TMM (Trimmed Mean of M values)

TMM calculates the scaling factor from a set of trimmed genes by assuming more genes are not differentially expressed genes[14,16].The trimmed genes are obtained by removing the special genes falling in the upper and lower 30% of a log-fold ratio cutoff (“M value”), and those falling in 5% of the upper and lower of an absolute gene expression value cutoff. The scaling factor is calculated as a weighted means according to the trimmed genes. The weights for the trimmed genes per sample are calculated according the relations between gene read counts of the sample, its reference and their library sizes.

Similar to DESeq, TMM aims at adjusting the gene frequency, length and sequencing depth biases in its scaling factor. It removes the extreme genes with very high and low expressed genes and adjust the all gene read counts so that they are more independent with respect to gene length, frequency and sequencing depth. Since the “M” value can be biased under the zero inflation of scRNA-seq data, it might have some risk to apply TMM for scRNA-seq normalization directly. However, the trimming procedure can remove a portion of genes with dropout issue for its special “M” values. It will have a good potential to work well for scRNA-seq data processed by imputation procedures, though almost no such work has been done in the literature[24].

2.5 The best bulk methods for scRNA-seq data normal-ization

TMM and DESeq have been ranked the two best normal-ization methods in the previous work via simulation[10-11,14]. But there are no rigorous reasons given for why TMM and DESeq normalizations outperform the others theoretically besides simulations and downstream analysis.

However, it is easy to explain the superiority of the two methods to the others by following the framework of bias analysis. Both TMM and DESeq cover the most in-sample and out-sample biases via relatively complicated scaling fac-tor calculations, though they do not have systematic ways to handle dropout bias. On the other hand, TC and UQ only adjust the depth bias. TMM and DESeq get all samples involved to calculate the scaling factor for each sample. Such a global normalization mechanism outperforms the local normalization mechanism used in RPKM-related methods, where the normalization of each sample does not need the information from other samples. The normalization mechanism limits their capabilities to overcome depth bias and gene length bias, because the gene read counts from a single sample are not enough to reflect the depth bias or gene length bias across the whole data.

Thus, TMM and DESeq both outperform the other bulk methods in normalization for its more targeted bias conquering and global normalization mechanism. They should be the best bulk methods for scRNA-seq normalization, though they lack the clear drop bias handling scheme in normalization and may suffer from biased scaling factor estimation. Table 1 summarizes the eight bulk normalization methods according to their target biases and global or local normalization mechanism.

Table 1 Bulk RNA-seq normalization methods

3 State-of-the-art scRNA-seq normaliza-tion methods

Besides the inherited normalization methods from bulk RNA-seq, several scRNA-seq nomalization procedures are proposed recently. The methods generally contain very special procedure to overcome the drop bias of scRNA-seq data, but do not rely on imputation or dropout recovery procedures generally. They usually target the depth and dropout biases for scRNA-seq data. We review the existing state-of-the-art scRNA-seq normalization methods:scorn,SCornm, andscone[4,10,32-33]. The scorn andSCornmboth are tailored independent normalization methods for scRNA-seq, butsconeis a pipeline normalization ranking method that include several different normalization methods.

3.1 Scran: A pooling scaling normalization

It calculates scaling factors by employing a deconvolution strategy to adjust dropout and depth bias bias[32-33]. Similar to TMM, it assumes most genes are not differen-tially expressed (DE) between cells (samples)[4,16]. The deconvolution strategy starts normalization on summed expression values from pools of cells (samples) to calculate pool-based scaling factor. It then is deconvolved to get the scaling factor for each cell in the pool. The pool is selected as a set of cells with similar library sizes. The pooling sum aims at decreasing the impacts of the dropout and depth biases. It conducts clustering before normalization to find suitable “pools” for large datasets and low-abundance genes are usually removed before normalization[4, 32]. The method is implemented in the Bioconduct R packagescran. It can be viewed as an enhanced local version TC (total count) normalization for samples with similar sequencing depths. It has a good portability for applying to bulk RNA-seq data though it does not address the gene length and frequency biases in normalization. But it should outperform UQ and TC methods for its pool-based depth and dropout bias removal scheme.

The possible weakness of this method is that the library size might be biased due to the zero inflation, especially because no imputation procedures are employed in bookkeeping. The cells assigned in a same pool may have large variances. The pool-based gene count summation may have a limited capability to decrease the impacts of zero inflation, if a pool has few cells. Furthermore, clustering algorithms will directly affect the quality of pool selection and final normalization. However, scRNA-seq data can be sensitive to different clustering methods for different natural distances or “learned” similarity metric and dimension reduction methods[34-37]. Thus, such a method may suffer from reproducibility issue in normalization for different clustering methods.

3.2 SCornm: A count-depth-based scaling normalization

SCnormassumes that most genes are not differentially expressed per cell as TMM andscran[10]. It aims at depth and dropout biases removal by grouping genes according to their their count-depth relationships per cell. Then a quantile regression is applied to each group to estimate scaling factors. The count-depth relationship for a gene is the ratio between its gene read count and the library size of the cell. The genes are usually divided into equally sized groups according to their non-zero median expression before calculating their count-depth ratios. The dropout bias is not only in gene partition, but also the quantile regression procedure for each group, because quantile regression is more robust to outliers, which are zero counts in this context. The method is implemented in the Bioconduct R packageSCnorm[10].

SCnormcan not work well for very sparse scRNA-seq data (e.g., a dataset with more than 80% zero counts). In stead,SCnormrequires one cell have at least 100 non-zero values or total counts >10,000, because it does not use any imputation procedures to replaces “the zeros” with substituted values. The method localizes genes according to their count-depth ratios. It is equivalent to partitioning genes by applying TC normalization. The scaling factors are estimated via a local quantile regression procedure. The weakness of this approach might be biased under an inaccurate library size for a cell beside ignoring gene length and frequency biases. Furthermore, since only information for each cell is employed in its normalization locally, it would be quite sensitive to different cells by ignoring coherence relationships between cells.

3.3 scone: A pipeline normalization

scone, which is implemented in the Bioconduct R packagescone, is technically a pipeline normalization procedure by wrapping different steps such as sample & gene filtering beside imputations before normalization[38-39].

Then different normalization methods that range from bulk RNA-seq normalization methods (e.g., TC, DESeq, TMM) toscranwill be ranked under some performance metrics. It builds well-tuned bookkeeping procedures to enhance data quality before normalization to decrease the impacts of dropout. It replaces zero-abundance values with expected values under a prior dropout model or imputation method[23]. The unwanted data variations are fur-ther removed by employing via regression approaches[19,25]. Each normalization procedure performance insconeis ranked according to 8 metrics, 6 of which are generated from PCA analysis and following clustering for normalized data. Finally, a normalization with the best score will be selected for downstream analysis.

sconecan achieve a good performance in practice for it packs data filtering, dropout recovery and different normalization procedures. However, it is hard to evaluate it as an independent normalization method because it is by nature a pipeline method. The possible weakness may be from possible over-filtering procedure. It can change the nature of the input data. That is, a sparse scRNA-seq data can be filtered to be a bulk RNA-seq dataset and run underscrannormalization. Furthermore, the normalization performance scoring system seems to be ad-hoc for two different types of RNA-seq data. The normalization procedures included insconelack orthogonality and DESeq and TMM will have a much higher probability to be selected than TC andscrannormalizations for a non-sparse scRNA-seq dataset.

4 Normalization method comparisons: A case study

We employ two benchmark datasets to evaluate the impacts of four normalization methods: TMM, DESeq,scranandSCronmunder PCA and NMF-based clustering[40-41]. There are no imputation or dropout recovery procedures applied to the two datasets before normalization for the sake of fair comparisons.

The first dataset: Pollen data was designed to test the utility of low-coverage scRNA-seq in identifying distinct cell populations[26]. It consists of 249 cells across 14 805 genes for 11 different cell types that include skin cells, pluripotent stem cells, blood cells, and neural cells.

The second dataset Usoskin data consists of 622 cells from the mouse dorsal root ganglion across 17 772 genes. It has four types of neuronal types: peptidergic nociceptors, nonpeptidergic nociceptors, neurofilament containing, and tyrosine hydroxylase containing[42].

We use NMI (normalized mutual information), a standard measure of clustering concordance, to evaluate the degree of the phenotype clustering[34,43]. NMI is a value between 0 and 1 and a larger NMI value indicates better phenotype clustering[34]. The NMI values are calculated by usingk-meansto cluster the dimension-reduced data from PCA or NMF by assuming the number of cell clusters as a priori. The corresponding clustering is called PCA or NMF-based clustering in our context[44].

Table 2 illustrates the Pollen data’s NMI values of PCA-clustering for raw data and corresponding four normalized datasets. The NMI values are calculated by using the projected data in the PCA-subspace spanned by 1 principal components to 9 principal components. It is clear thatSCnormnormalized data has the best phenotype clustering performance on behalf of NMI. Thescrannormalzied data and raw data achieve the second and third best performance respectively. The TMM and DESeq data can’t even compete with raw data in performance. It suggests that more tailored scRNA-seq normalization methods outperform the bulk RNA-seq normalization methods for this dataset.

Table 2 Pollen data NMI values of PCA-clustering

Table 3 illustrates the Usoskin data’s NMI values of NMF-clustering for the raw data and corresponding four normalized datasets. We drop PCA-clustering for this data because it loses distinguishing capabilities for its poor performance on all datasets. We still use the projected data in the NMF-subspace spanned by 1 to 9 bases for the sake of consistence.

Table 3 Usoskin data NMI values of NMF-clustering

Thescrandata achieves the best performance among all datasets. The raw data outperformsSCnormand TMM and DESeq normalized data. It further indicates that TMM and DESeq are not good normalization procedures for scRNA-seq data. At the same time, it echoes that TMM and DESeq may handle scRNA-seq data poorly in normalization for their lower performance in phenotype clustering than the raw data.

5 Conclusion

We review existing normalization methods for scRNA-seq data via the framework of bias analysis proposed in this study. It is interesting to see that more tailored scRNA-seq normalization methods focus on the between sample biases: depth and dropout biases, but pay little attention to the in-sample biases: gene length and frequency biases. It is possible due to the priority and importance of the drop and depth biases for scRNA-seq data. Furthermore, it seems that the tailored scRNA-seq normalization methods are more suitable for scRNA-seq data than the bulk RNA-seq normalization methods such as TMM and DESeq from our case study.

Compared to the bulk RNA-seq normalization methods, scRNA-seq data normalization methods are on the way to maturity. It should adjust more biases as many as TMM and DESeq methods besides the dropout and depth biases along with dropout recovery models. It is possible that well a “pooling” TMM and DESeq can be developed for scRNA-seq data under novel low-rank approximation or other imputa-tion and dropout recovery methods that retrieve true signals for the drop out issue[45-49]. On the other hand, the regression-based normalization method (e.g.,SCnorm) may stand for a more robust and worthy direction to move for scRNA-seq normalization, especially because scaling methods alone may be inadequate to normalize scRNA-seq data well for the complexity of scRNA-seq analytics.

There are no widely acceptable standards or measures to evaluate the effectiveness of the normalization methods besides downstream analysis. It is desirable to develop some entropy-based metric to answer the query: “what happens after the normalization numerically to the dataset?”[5]. Such a measure will provide insights to compare different normalization methods more directly, though some clustering measures or differential expression analysis metrics can also provide some downstream feedback[5,13,32].

The spike-in normalization methods are not covered in this review yet. It can be an alternative way for scRNA-seq data normalization. Some studies suggest that methods employing spike-in external RNA control Consortium RNA molecules significantly outperform those without it[50]. However, such an approach may rely on the good performance of calibration and too many external spike-ins can shadow the original gene expression and result in biased normalization[33,50].