APP下载

Theoretical analysis of RNA polymerase fidelity: a steady-state copolymerization approach

2022-02-18WenboFuQiushiLiYongshunSongYaogenShuZhongcanOuyangandMingLi

Communications in Theoretical Physics 2022年1期

Wenbo Fu, Qiushi Li, Yongshun Song, Yaogen Shu,Zhongcan Ouyang and Ming Li

1 School of Physical Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China

2 School of Physics, East China University of Science and Technology, Shanghai, 200237, China

3 Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, 325001, China

4 Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, 100190, China

Abstract The fidelity of DNA transcription catalyzed by RNA polymerase(RNAP)has long been an important issue in biology.Experiments have revealed that RNAP can incorporate matched nucleotides selectively and proofread the incorporated mismatched nucleotides.However, systematic theoretical researches on RNAP fidelity are still lacking.In the last decade,several theories on RNA transcription have been proposed, but they only handled highly simplified models without considering the highorder neighbor effects and the oligonucleotides cleavage both of which are critical for the overall fidelity.In this paper,we regard RNA transcription as a binary copolymerization process and calculate the transcription fidelity by the steady-state copolymerization theory recently proposed by us for DNA replication.With this theory, the more realistic models considering higher-order neighbor effects,oligonucleotides cleavage,multi-step incorporation and multi-step cleavage can be rigorously handled.

Keywords:RNA polymerase,fidelity,transcription,proofreading,steady-state copolymerization theory

1.Introduction

Accurate transfer of genetic information is critical for the survival and reproduction of living organisms.For example, the transcription fidelity in bacteria and eukaryotes is about 103–105[1–4].The kinetic proofreading mechanism, proposed by Hopfield [5] and Ninio [6], correctly pointed out that such high fidelity is not determined thermodynamically by the free energy difference, but kinetically by the incorporation rate difference between matched pairs and mismatched pairs.However, the original version of the kinetic proofreading mechanism assumed that the proofreading occurs before the nucleotide is covalently incorporated into the terminal, which is different from the real mechanism of RNA polymerases (RNAP).

Experiments show that the incorporated mismatched nucleoside triphosphates (NTPs) can still be proofread by RNAP [7–9].Since the proofreading mechanism is not fully clear,various models were proposed with different details but also with the following similarities [10–12].(I) RNAP has two working modes:the incorporation mode and the cleavage mode.In the incorporation mode, RNAP can selectively incorporate the matched NTP.The fidelity contributed by the incorporation mode is defined as the initial discrimination.In the cleavage mode, RNAP can proofread the incorporated mismatched nucleotides, which further enhance the fidelity for about 102[13].This enhancement is defined as proofreading efficiency.(II) RNAP cleaved at least two nucleotides,and it is widely believed that the proofreading efficiency is mainly contributed by dinucleotides cleavage [7].(III)There are neighbor effects, i.e.the terminal, the penultimate,and deeply buried mismatches may inhibit incorporation and promote proofreading [14].These neighbor effects could be attributed to the incorporated mismatches that weaken the adjacent base pairs [14, 15].

The central issue of all the relevant theoretical studies is to show how transcription fidelity is determined by the involved kinetic parameters.Several theories were proposed for highly simplified transcription models[10–12],without considering the neighbor effects or just making very rough estimates on the fidelity by using untested assumptions.There still lacks a systematic and more precise study on much more realistic models.Recently we have treated a quite similar problem for DNA polymerase (DNAP) fidelity, proposing the steady-state copolymerization method which can be used to handle highly complicated kinetic models of DNA replication [16, 17].In this paper, we will generalize this method to investigate the fidelity issue of RNA transcription.The paper is organized as follows.We first illustrate the basic theory of the steady-state copolymerization method by the minimal transcription model with firstorder neighbor effect and dinucleotides cleavage in section 2.1 More realistic models considering higher-order neighbor effects,multi-step incorporation and oligonucleotides cleavage are also handled in sections 2 and 3.In particular,we analytically derive the mathematical expressions of the transcription fidelity in terms of some key kinetic parameters in sections 2.3 and 3.These expressions provide new and intuitive insights on how the incorporation mode and cleavage mode of RNAP are coordinated to achieve high transcription fidelity.

2.Copolymerization model with dinucleotides cleavage

The interaction between the RNAP and the RNA/DNA hybrid is extremely complex, but the dominating factor for the transcription fidelity is widely believed to be the incorporationcleavage kinetics occurring at the terminal region of the RNA transcript.In the real transcription,the template DNA contains 4 types of deoxynucleoside monophosphate(dNMP):A,U,C,and G, and there are 3 types of mismatches for each type of template dNMP.Here, inspired by the experimental data of DNA replication kinetics[17],we assume that any kinetic rate of a match(or mismatch)is of the same order of magnitude to the counterpart of another match (or mismatch), and there are orders of magnitude difference between the rates of the matches and the counterparts of the mismatches.Hence, we can approximately regard RNA transcription as a binary copolymerization process of two monomers,R(the match)and W(the mismatch), without explicit consideration of the sequence of the DNA template.

During the transcription, the percentage of R and W of the transcript changes with time but eventually reaches a constant, i.e.d(NR/NW)/dt=0, here NRand NWare the total number of R and W in the transcript, respectively.This leads tohereandrepresent the time derivatives of NRand NW, respectively, from which the fidelity can be defined as,

Below we will first introduce the minimal model with mono- and dinucleotides cleavage and show how to calculate fidelity considering the first-order neighbor effect (section 2.1).The higher-order effect can also be considered in the minimal model (section 2.2).Then we will try to get the analytical expression of the fidelity approximately under bio-relevant conditions (section 2.3), and this logic will be extended to the more realistic multi-step models (section 2.4).

2.1.Minimal model with first-order neighbor effect

Denoting the incorporation rate constants, mono- and dinucleotides cleavage rate constants asK , Z ,Q , we have the minimal model as shown in figure 1.

Figure 1.The minimal copolymerization model of RNA transcription with mono- and dinucleotides cleavage.

For example,

In the long time limit, the occurrence probability of any terminal sequence will eventually reach a stationary distribution= 0).In this steady-state stage,

Unfortunately,these unclosed equations(3)can not be solved to getPi n…i2i1,Ji n…i2i1orNi n…i2i1.Therefore we propose the following factorization conjecture to close the equations,

With this factorization conjecture, the original unclosed equation (3) can be transformed into the following closed equations with four basic variables

According to equation (1), the steady-state fidelity can be calculated by,

here JR=JRR+JWRand JW=JRW+JWW, which can be got by solving equation (7).

2.2.Minimal model with higher-order neighbor effect

The same logic in section (2.1) can be generalized to cases with any order neighbor effects.For h-order neighbor effect, the rate constants can be written as K(or Z , Q)ih+1i h…i1.The kinetic equations ofPih+1ih…i1andJih+1i h…i1are,

To solve these unclosed equations, we use the h-order factorization conjecture as following,

These finally lead to a set of closed equations,

By solving these equations,one can get J and further calculate the fidelity defined by equation (8).

2.3.The approximate fidelity under bio-relevant conditions

In order to intuitively understand how RNAP fidelity is determined by some key kinetic parameters, one has to solve equations (7), (11) to obtain the analytical expression of the fidelity.However, it is very hard to solve such nonlinear equations,particularly when the number of variables becomes large (i.e.h is large).Below we will try to derive the approximate expression of the fidelity for any h-order model,under some special conditions of the kinetic parameters.

Figure 3.The transcription model proposed by Ehrenberg et al [10].

Details of the calculation can be found in appendix A.This approximate expression shows good agreement with the numerical results by directly solving equation (7), as shown in appendix A.The initial discrimination can be roughly regarded as the ratio between the two incorporation rates of the match and the mismatch.The proofreading efficiency can be roughly regarded as the ratio of the elongation probability of the terminal match (Pel,R) to that of the terminal mismatch Pel,W.Obviously Pel,R≈1 andwhich leads toIn fact, many experimentalists used similar expressions to estimate the fidelity intuitively without any justification[10,13].Here we have provided rigorous proof.It should be pointed out that their intuitive analysis is hard to be applied to high-order neighbor effects, while our theory can easily handle such complex cases.

For h-order model the fidelity can be approximately written as,

Details of the calculation can be found in appendix B.To better illustrate the complexity of equation (13), we compare it with the DNA replication fidelity as following [17],

The main difference between DNAP and RNAP proofreading mechanism is that DNAP can excise only one terminal dNMP, i.e.the minimal model in figure 1 can describe DNA replication ifQ =0 and equation (13) is thus reduced to equation (14).Equation (14) provides an intuitive way to generalize the analytical expressions of DNAP fidelity from lower-order (e.g.h=1) to any higher-order neighbor effects,in terms of a set of elongation probabilities defined similarly to the one mentioned above.It seems natural to follow the same logic to generalize equation(12)to higher-order models of RNA transcription by defining elongation probabilities with only the total excision ratesQ +Z ,i.e.simply replacing Znin equation(14)byQn+ Zn.However,this intuitive logic leads to expressions for RNAP fidelity much different from equation (13).

2.4.Multi-step transcription model

The real transcription scheme can be more complex than the minimal model, for example, there can be a multi-step incorporation process.One can also apply the steady-state copolymerization analysis to these complex reaction schemes.Here we show a more realistic model proposed by Ehrenberg et al [10] as shown in figure 3.

For brevity, we only consider the first-order neighbor effect here.One can follow the same logic in section 2.1 to write the kinetic equations,

where superscripts X of PXdenote sub-states,i,j,k,l=R,W.[N TP], [NTP] represents the NTP concentration.In steady-state,P.X=0 , each PX(X ≠POST) in equation (15)can be written as a function of PPOST.So after eliminating other sub-states except PPOST, equation (15) can be reduced to,

Comparing equations (2) and (16), one can apply the same factorization conjecture in section 2.1 to get,

By solving equation (19) to getand, one can calculate the fidelity defined as,

3.Copolymerization model with oligonucleotides cleavage

Experiments have shown that RNAP can cleave more than two nucleotides once, for example, bacterial cleavage factor GreA can bind to RNAP and stimulate it to cleave di- and trinucleotides, and GreB responds for much longer RNA segment cleavage [19].Although it is still not clear whether the long backtracking contributes to the proofreading efficiency,it deserves a systematic theoretical treatment.Here we propose the oligonucleotides cleavage model and apply the steady-state copolymerization analysis.

Considering the h-order neighbor effect and L-length oligonucleotides cleavage, we give the minimal model as shown in figure 4.

Figure 4.The minimal model of RNA transcription with mono- to Lth-nucleotide cleavage.

Following the same logic in section 2.1, the kinetic equations of P and J are,

It is obvious that h ≥L sinceQ depends on all the nucleotides on the cleaved fragment.The structure of equation(21)is the same as equation (9) despite J becomes extremely complex.Since the total transcript length N=NR+NWis far greater than L, steady-state are still satisfied, factorization conjecture in equation (10) can be applied to get the following closed equations,

One can reasonably assume the bio-relevant conditions are still satisfied and get the approximate fidelity,

Details of the calculation can be found in appendix C.

This expression clearly shows the contribution of oligonucleotides cleavage to overall proofreading efficiency.For example, the contribution of trinucleotides cleavage isin whichandrepresent the proofreading efficiency with or without trinucleotides cleavage.Since the trinucleotides cleavage corresponds to 2-or higherorder neighbor effect, here we discuss the simplest situation h=2.This ratio can be written as,

Q3,RRW≥Q1,RRW+ Q2,RRWorQ3,RWR≥Q1,RWR+Q2,RWRcan lead to equation(24)≥2,which means the trinucleotides cleavage contributes more to the overall fidelity than dinucleotides cleavage.Equation(24)provides a clear perspective on which critical kinetic parameters should be concerned by experimentalists, and hence the contribution of trinucleotides cleavage to the fidelity can be quantified if these rates are measured.

4.Summary

In this paper,we develop a systematic method to study DNA transcription fidelity rigorously.First, we propose and handle the minimal model with first-order neighbor effect to illustrate the basic logic of the steady-state copolymerization analysis in section 2.1.This analysis is then extended to handle highorder effects as shown in section 2.2.Section 2.3 gives the approximate fidelity with high precision and clearly shows the fidelity determined by some critical rate constants.This approximate fidelity expression also shows how the RNAP incorporation mode and cleavage mode contribute to the overall fidelity.To consider more realistic models,we handle the multi-step model and get the fidelity with uniquely defined effective rates in section (2.4).These sections show that the steady-state copolymerization analysis can be generalized to any realistic model considering more details,such as the oligonucleotides cleavage model in section (3).

It should be pointed out that the high-order neighbor effect deserves more concerns.For a similar process DNA replication, it has been indicated the second-order proofreading can contribute a factor ∼10 to the overall fidelity[17], which is due to the extra instability of the DNA duplex terminal inside the DNAP when a mismatch occurs at the penultimate site of the terminus.Since the physical properties of the DNA duplex(in DNA replication)and the RNA-DNA duplex (in DNA transcription) are quite similar, one can expect similar second-order effects may also be presented in DNA transcription.Unfortunately,so far as we know,there is no experiment to discuss such higher-order effects.Our theory and major conclusions presented in this paper may stimulate future experimental investigations on these issues.

Last, there are some sub-processes in DNA transcription which are not considered in this paper, e.g.substeps such as RNAP dissociation and rebinding.During the transcription,RNAP is tightly bound to the nascent transcript and the template DNA,forming the transcription bubble.The deformation of the transcription bubble is also not considered in our models.All these complex sub-processes can easily be handled by our theory.One can follow the same logic to get the effective rates and calculate the fidelity by the approximate expression.However,our theory has a critical limitation, i.e.it ignores the template sequence effects which is more concerned about by biochemists.We hope this theory may serve as a start for future theoretical studies to incorporate more realistic and more complex factors such as template sequence effects.

Acknowledgments

The authors thank the financial support by National Natural Science Foundation of China (No.11675180, 11774358), the CAS Strategic Priority Research Program (No.XDA17010504),Key Research Program of Frontier Sciences of CAS (No.Y7Y1472Y61),Research Fund of Wenzhou Institute CAS(No.WIUCASYJ2020004, WIUCASQD2020009).

Appendix A.The derivation of the approximate fidelity with first-order neighbor effect

Below we give the derivation of the approximate fidelity equation(12).The equations of the minimal model with firstorder neighbor effect are

Table A1.Parameters used in figure A1.

With equations (A.1), (A.2), one can get JRR≫JWR,JRW≫JWW, which gives,

To calculate PRR/PWRin equation (A.3), we introduce that,

With equation(A.2),JRR/JWR=PRR/PWRcan be rewritten as,

And JRW=JWRin equation (A.1) leads to,

With a1, a2, one can finally get the approximate fidelity as,

Appendix B.The derivation of the approximate fidelity with high-order neighbor effect

Below we give the derivation of the approximate fidelity equation (13).The equations of the minimal model with horder neighbor effect are

where ih, ih−1, …, i1=R, W, and,

According to the bio-relevant conditions, one can reasonable assume that,

With equations (A.10), (A.12), one can get JR≈J0,JW≈Jh+1, which gives,

To calculate P0/P1in equation (A.13), we introduce that,

It is easy to obtain the expression of factor a1by equation Jh+1/J1=Ph+1/P1in equation (A.10),

And equations (A.10), (A.12) leads to,

one can expend equation (A.16) and get,

further we have,

Sincea1≫1, one can rewritten equation (A.18) as,

With the product of the above h+1 factors, one can finally get the approximate fidelity as,

Appendix C.The derivation of the approximate fidelity of oligonucleotides cleavage model

Below we give the derivation of the approximate fidelity equation (23).The equations of the minimal model with horder neighbor effect and L-length oligonucleotides cleavage are

where ih, ih−1, …, i1=R, W, and,

Following the same logic, equation (A.12) still holds here to give,

Equations (A.12) and (A.21) lead to,

According to condition (c), every P contains two more Ws are negligible in equation (A.24), for example,So equation (A.24) can be rewritten as,

To calculate P0/P1in equation (A.23), we introduce that,

Let φs=A1andφeh=A2A3…Ah+1, we finally have,