运用概念密度泛函理论和信息论方法定量描述胺类分子的分子碱度
2020-12-25肖雪珠曹小芳赵东波荣春英刘述斌
肖雪珠,曹小芳,赵东波,荣春英,*,刘述斌
1湖南师范大学化学化工学院, 长沙 410081
2 南京大学化学化工学院,南京 210023
3北卡罗莱纳大学大学教堂山校区超级计算中心, 北卡罗莱纳州 27599-3420,美国
1 Introduction
The pair of molecular acidity and basicity is the most fundamental chemical concept. Formulated in 1923, Brønsted and Lowry independently generalized the Arrhenius theory,where an acid HA or base B was defined as the substance dissociating an ion of H+or OH-, respectively, in aqueous solution, by proposing that acidity and basicity be appreciated using the reaction of HA and B through the exchange of a proton,HA + B ⇌ A-+ HB+, forming the conjugate base A-and conjugate acid BH+. Within this scheme, molecular basicity of B is defined through the pKavalue of its conjugate acid BH+,denoted by pKaH, for the reaction1-3
where the reaction equilibrium constantKaH = [B][H+]/[BH+].The concept of the conjugate acid in the Brønsted-Lowry theory forms the foundation of studies on molecular basicity, allowing greater generality and flexibility to access experimentally unavailable compounds. The molecular basicity of the base B,pKb, can be readily calculated by the formula pKb= 14 - pKaH.In the literature and textbooks, pKaH and pKaare often interchangeably used.
While experimental approaches to measure molecular acidity and basicity are adequately mature, it is not always handy to do so, especially for intermediates and other experimentally inaccessible species. Computationally, the equilibrium constantKaH can be obtained from the standard Gibbs free energy change ΔGөof the above reaction through the relationship ΔGө= 2.303RTpKaH, whereRis the gas constant andTis the temperature in Kelvin4-6. This ΔGөvalue can subsequently be simulated byab initiowave function or density functional theory (DFT) method,either directly for the above reaction or through an artificially designed thermodynamic cycle7-9. These simulation protocols,however, are computationally demanding and notoriously prone to systematic errors.
Recently, using descriptors from conceptual density functional theory (CDFT)10-12, we proposed to quantify molecular acidity from a different perspective. Two equivalent CDFT descriptors, molecular electrostatic potential (MEP) and natural atomic orbital (NAO) energy, were systematically investigated and theoretically justified, which could be employed to quantitatively predict pKavalues. More recently, we proposed to apply simple electron density functionals from the information-theory approach to account for molecular acidity11,13-19,including Shannon entropy, Fisher information, Ghosh-Berkowitz-Parr entropy, information gain, Onicescu information energy, and relative Rényi entropy. We examined the applicability of our methods for singly and doubly substituted benzoic acids, benzenesulfonic acids, benzenesulfonic acids,phenols, and alkyl carboxylic acids13,20,21. The validity and effectiveness of these methods have been validated.
Are our methods equally applicable to appreciate molecular basicity? The purpose of the present work is to answer this question. To that end, we utilize our methods to three categories of amines, primary, secondary and tertiary, with a total of 179 molecular systems. Amines as most widely used basic and commercially available compounds have been extensively applied in chemistry and biology22,23. Their experimental basicity data expressed in terms of pKaH values are extensively available in the literature. These facts make our present study both relevant and feasible. In what follows, we will briefly review the theoretical framework, and then detail the computational methodology, followed by the section of results and discussion. A few final concluding remarks are provided at the end.
2 Theoretical framework
In CDFT, the total energy E of an electronic system such as acids and bases can be determined by two natural variables, the electron number of electronsNand the external potentialv(r)made of the atomic nuclei. Any change of the system due to the changes inNandv(r) can be expressed as a perturbation series in accordance with the following the Taylor expansion10-12,
where (∂E/∂N)vand [δE/δv(r)]Nare the first-order partial derivatives of the total energy with respect toNandv(r) withv(r)andNfixed, respectively, and, similarly, (∂2E/∂N2)vand[δ2E/δv2(r)]Nare corresponding second-order terms, whereby[∂/∂N(δE/δv)N]νis the second-order cross term. The physiochemical meaning of (∂E/∂N)vis simply the chemical potentialμ(minus of electronegativity), (∂2E/∂N2)vis hardnessη,[∂/∂N(δE/δν)N]νFukui functionf(r), and [δ2E/δv2(r)]Nthe response functionχ(r,r’). Applying Eq. (3) to the proton dissociation process of Eq. (1), up to the second order, we obtain that the total energy of the dissociation reaction (ΔE) can be approximated by the following formula in terms of the external potential change Δv(r) alone11,13,24-26
whereρ(r) is the electron density of the system. All ΔNrelated terms in Eq. (3) are zero because during the process in Eq. (1)the total number of electrons is kept unchanged. If one further simplifies the above formulation by omitting the contribution from the second order and only the first-order term is considered,the following approximate relationship can be obtained.
where Δv(r) is the external potential change resulted from the dissociation of the leaving proton in Eq. (1), which can be explicitly obtained by the following formula11,20,24-26,
whereRH is the coordinate of the leaving proton. Inserting Eq.(6) into Eq. (5) leads to11,20,24-26
This formula suggests that, as the first-order approximation,the MEP value on the leaving proton should be a strong indicator of the acidity. This point has been numerically verified by our numerous previous reports13,14. The reason why MEP and NAO descriptors are equivalent is that the electrostatic potential on an atomic nucleus is generated predominately by the electron density contribution from the valence orbitals of the very same atom.
More recently, quantities from the information-theoretic approach (ITA) have been employed to quantify molecular acidity. The rationalization behind this effort is the fact that,according to the basic theorem of DFT, the electron density alone should contain adequate information to determine all properties in the ground state including molecular acidity and basicity. ITA provides a feasible pathway for us to make it happen. This is because all ITA quantities are simple density functionals, each of which possesses distinct physiochemical understandings like steric effect, stereoselectivity, electrophilicity, nucleophilicity,etc.16,18,27,28. These ITA quantities include Shannon entropy (Ss),a measure of the spatial delocalization of electron density29-31,
Fisher information, a gauge of the sharpness of the electron density distribution32,33
where ∇ρ(r) is the gradient of electron density, and Ghosh-Berkowitz-Parr (GBP) entropy (SGBP)33,34,
wherecandkare constants,t(r,ρ) is the kinetic energy density,tTF(r;ρ) is the Thomas-Fermi kinetic energy density. Also, within the ITA framework, following quantities are well defined, such as information gain (IG), a non-symmetric measure of entry difference between two probable distribution function35-39,
Rényi entropy of ordern,Rnwheren≥ 0 andn≠ 140,41
and the relative Rényi entropyof ordern42
whereρ0(r) is the reference state density satisfying the same normalization condition asρ(r).
and Tsallis entropyTnof ordern43
which was stemmed from a generalization of the standard Boltzmann-Gibbs entropy. The common term in Eqs. (12) and(14) is the integral of thenth power of the electron density, which can indeed be regarded as a separate ITA quantity on its own right, called the Onicescu information energy (En) of ordern41,44
In this work, we make sure of these quantities in our effort to evaluate molecular basicity for three categories of amines,primary, secondary and tertiary. Since these quantities are all simple density functionals, according to the basic theorem of DFT, they should form the robust basis for us to accurately predict the basicity for these species. This work is a continuation of our previous study to quantify molecular acidity of benzoic acids, benzenesulfonic acids, benzenesulfonic acids, phenols,and alkyl carboxylic acids, where we examined above ITA and CDFT descriptors for acids HA. In this work, for the molecular basicity of a base B, we do not consider the properties of its conjugate acid BH+. Instead, we examine above descriptors for the neutral base B only. In this manner, we get rid of the complexity to take into account of the solvent effect for a molecular cation, which is notoriously to be a big yet unresolved issue from the computational viewpoint. Also, this manner demonstrates the flexibility and robustness of our approach,which works for both acids and bases without the requirement that only acids or conjugate acids are to be concentrated.
3 Computational details
A total of 179 amine molecules including 73 primary amines,41 secondary amines, and 65 tertiary amines are examined in this work. Their chemical structures are presented in the Supplementary Information. They are singly and doubly substituted aliphatic amines, anilines and their derivatives. Their experimental pKa values were taken from the literature45-53. All structures of the amines were fully optimized at the DFT B3LYP/6-311+G(d,p) level of theory54-56by employing the Gaussian 0957package with the tight self-consistent filed convergence and ultrafine integration grids. For the molecules with more than one stable conformations, we adopted the one with the lowest energy. A single-point frequency calculation was performed to ensure that the final structure has no imaginary frequency. Subsequently, a single-point calculation was performed to obtain the molecular electrostatic potential and the natural atomic orbital energies by using the keywordpop =(chelpg, nbo)with an NBO (natural bond orbital) analysis. MEP values are extracted from the calculated electrostatic potential on a nitrogen nucleus, and NAO values are calculated as the sum of 2px, 2pyand 2pznatural atomic orbital energies of the same nitrogen atom. The Multiwfn 3.6 package58was applied to calculate the information-theoretic quantities by using the checkpoint file generated from the above Gaussian calculations as the input file. The Hirshfeld’s stockholder approach59was adopted to partition atoms in molecules to obtain atomic values of above ITA quantities on nitrogen atoms in all amine systems.According to our previous reports60, the ITA quantities are not sensitive to partition methods of atom in molecules, which included Hirshfeld’s stockholder approach59, Becke’s fuzzy atom approach61, and Bader’s zero-flux approach62. All three partition methods yielded qualitatively similar results. Notice that we did not consider the solvent effect in this study. This does not mean that the effect is not important. Rather, we assumed that the solvent effect had similar impact on all species in these categories of molecules. Since we employ different descriptors to correlate with experimental pKa values, its impact can be dumped into the least-square-fitting as a system-dependent constant.
4 Results and discussion
Table 1 shows the experimental pKavalues for 73 primary amines, together with their MEP and NAO results. The squareof correlation coefficient (R2) between MEP and NAO is 0.999, suggesting that these two CDFT descriptors are equivalence because the electrostatic potential on N is predominantly generated by the electron density contribution from the natural valence orbitals of the same atom13,14. This result is in excellent agreement with our previous studies for other systems. Using MEP and NAO descriptors, we can reasonably account for the molecular basicity for primary amines withR2equal to 0.947 and 0.946, respectively. Again,these results are in magnificent agreement with our earlier results for other species21, confirming again that MEP and NAO are two decent yet equivalent descriptors for both molecular acidity and basicity.
Table 1 Experimental pKa data, molecular electrostatic potential on nitrogen and nitrogen valence NAO energy for primary amines (unit: atom units (a.u.)).
continued Table 1
Table 2 Experimental pKa data, molecular electrostatic potential on nitrogen and nitrogen valence NAO energy for secondary amines (unit: atom units (a.u.)).
To verify the validity and effectiveness of MEP and NAO quantities for both secondary and tertiary amines, tabulated in Tables 2 and 3 are their numerical results for 41 secondaryamines and 65 tertiary amines, together with their experimental pKa data from the literature45-53. The square of correlation coefficient (R2) between MEP and NAO is 0.993 and 0.994,respectively, for the two series of amines, confirming the equivalence of the two CDFT descriptors. For secondary and tertiary amines, it is also possible to obtain decent correlations of the pKavalues with these descriptors, yieldingR2equal to 0.929 with MEP and 0.936 with NAO for the secondary amine category, and 0.913 with MEP and 0.900 with NAO for the tertiary amine category. To visualize these results, Fig. 1 exhibits the strong linear correlations for the three categories of amines.If we put all three categories of amines into one plot, the correlation will decrease. For example, theR2value between MEP and NAO for a total of 179 data points is 0.983. The square of correlation coefficient (R2) of the experimental pKavalues with MEP and NAO is reduced to 0.889 and 0.916, respectively.Even though they are less accurate, theseR2values are comparable to those of others that we obtained earlier for other
Table 3 Experimental pKa data, molecular electrostatic potential on nitrogen and nitrogen valence NAO energy for tertiary amines (unit: atom units (a.u.)).
Fig. 1 Strong linear correlations of experimental pKa data with (a)-(c) MEP and (d)-(f) NAO descriptors from CDFT for primary (a and d), secondary (b and e), and tertiary amines (c and f).
Table 4 Numerical results of Shannon entropy, Fisher information, GBP entropy, information gain, Onicescu information energy of second and third orders, and relative Rényi entropy of second and third orders for primary amine on the nitrogen atom for primary amines (unit: atom units (a.u.)).
continued Table 4
Table 5 Numerical results of Shannon entropy, Fisher information, GBP entropy, information gain, Onicescu information energy of second and third orders, and relative Rényi entropy of second and third orders for primary amine on the nitrogen atom for secondary amines (unit: atom units (a.u.)).
continued Table 5
Table 6 Numerical results of Shannon entropy, Fisher information, GBP entropy, information gain, Onicescu information energy of second and third orders, and relative Rényi entropy of second and third orders for primary amine on the nitrogen atom for tertiary amines (unit: atom units (a.u.)).
continued Table 6
Table 7 The correlation coefficient (R) of experimental pKa values of three categories (a primary; b secondary; and c tertiary) of amines with CDFT and ITA quantities, including MEP, NAO, and eight ITA quantities, SS, IF, SGBP, IG, E2, E3, Rr2, and Rr3.
How does the behavior of ITA quantities look like? Do there exist similar strong linear correlations with experimental pKavalues? Shown in Tables 4-6 are numerical results of eight ITA quantities on the nitrogen atom for primary (Table 4), secondary(Table 5), and tertiary (Table 6) amines, respectively. Their correlation coefficients (R) with experimental pKa values are tabulated in Table 7. As can be seen from Tables 4-6, the ITA quantities in most cases slightly fluctuate around their average value, which is characteristic of the nitrogen atom, and it is this small yet significant changes that reflect the nature of the electron density redistribution on the atom, which can be employed as quantitative measures of molecular acidity or basicity17. This point can be witnessed by the correlation coefficient results in Table 7 between experimental pKadata and ITA quantities. As can be seen from the Table, for all three categories of amines, reasonably strong linear correlations were often obtained, with the magnitude ofRlarger than 0.8. Moreover, we find that the results of primary amines are better than those of secondary amines, which are better than tertiary amines. This deterioration of correlation results might be originated from a few possible sources. Firstly, as more ligands are added to the nitrogen atom, more local minima are likely. Given the larger number of possible conformations for this many number of molecular systems studied in this work, it is almost impossible to guarantee that all the structures we obtained after the full optimization are global minima. This is the computational source. From the experimental perspective, since the measurement of the pKadata for amines involves their conjugate acid, the solvent effect of the conjugate acid for tertiary amines is likely to be more complicated63,64. This complexity is likely to reduce the quality of the experimental pKadata. In any case,even though tertiary amine results are not as good as primary amine ones, Table 7 unambiguously demonstrates that ITA can be used as reliable descriptors of molecular acidity/basicity for amines. Also shown in the Table is the result from Hirshfeld charge. As proved by us earlier, the Hirshfeld charge is the firstorder approximation of information gain16,17,65. Given what we have from the latter, it is no surprise that the Hirshfeld charge should be a reliable descriptor as well.
What do we obtain if utilizing these ITA quantities all together to simulate the experimental pKadata for each of the three categories of amines? Shown in Fig. 2 is the result from such an effort, with theR2value equal to 0.981 for primary amines, 0.924 for secondary amines, and 0.879 for tertiary amines, respectively.These results are comparable to what have obtained earlier for other systems21. From this Figure, we can see that ITA quantities are able to quantitatively simulate the experimental pKavalues for amines, even though different categories of amines tend to yield slightly different accuracy, for reasons we mentioned above.
The ultimate question to ask is whether or not it is possible that we put all data points together and simulate three categories of amines together in a single least-square fitting? To that end,there are three possible ways to make it happen, employing either or both CDFT and ITA quantities. That is, we can use (i)one of the two equivalent descriptors from CDFT, MEP and NAO, (ii) all eight ITA quantities alone, and (iii) both CDFT and ITA quantities all together. Our results are shown in Fig. 3. From the Fig. 3a, we observe that with the descriptor NAO alone, the correlation coefficient for the entire dataset is 0.916, which is as good as we previously obtained for other systems21.
Fig. 2 Comparison of the experimental pKa data with the fitted pKa values using all eight ITA quantities for (a) primary,(b) secondary, and (c) tertiary amines.
Fig. 3 Comparison of the experimental pKa data with the fitted pKa values using either or both CDFT and ITA quantities with all three categories of amines put together.(a) using NAO alone; (b) using eight ITA quantities; and (c) using both CDFT and ITA quantities.
This result confirms the validity and effectiveness of using CDFT descriptors such as MEP and NAO as reliable and robust measurements to quantitatively predict molecular acidity and basicity. On the other hand, with all eight ITA quantities only,the correlation becomes worse, withR2= 0.854, as shown in Fig.3b. Besides the reason aforementioned, it is also possible that the simple density functionals we employed so far do not yet form a complete set, as required by the information functional theory we proposed elsewhere. If more simple functionals are added,more accurate results should be within reach. However, if both CDFT and ITA quantities are employed, a significantly better fitting can be yielded, as shown in Fig. 3c, with the correlation coefficientR2equal to 0.951. Our results exhibit that combining CDFT and ITA quantities is an effective and robust way to quantify molecular basicity, enabling us to accurately determine the pKavalues for diversified categories of molecular systems.
So what is new? Compared to our previous work on molecular acidity for neutral acids13,21, this work presents a more challenging piece of work to do the same for bases, in particular three categories of amines. The main challenge is the complexity to adequately simulate the solvent effect for the cationic to conjugate acid HB+of a base B. Even though in our present work,we did not even consider the solvent effect in our calculations,the effect itself has been included in the experimental pKa data.How to computationally account for it is challenging. If we worked on conjugate acids instead, none of above significant correlations could be obtained. One thing novel to this work is that we discovered a workaround by working on neutral base molecules, instead of their conjugate acids. In this manner, much more effective and robust descriptors have been unveiled. Also,in our previous study for neutral acids, fewer ITA quantities were employed to yield better correlations21. This work presents a generalized approach to deal with more diversified systems. In this work, we have proposed to quantify acidity and basicity with both CDFT and ITA descriptors combined. The validity and applicability of this approach should be out of question, because they are warranted by the basic theorems of DFT. All these descriptors are simple density-based functionals. Given the nature and complexity of our systems examined in this work, we are confident that the same idea can be applied to many other molecular systems.
5 Conclusions
In this work, as a continuation of our endeavor in the past few years to appreciate traditional chemical concepts from the perspective of modern theory and accurate computation, we present a systematic study on how to determine molecular basicity of a base (or acidity of its conjugate acid) using quantities from conceptual density functional theory and information-theoretic approach. To avoid the complexity of dealing with simulating the solvent effect for cationic conjugate acids, we present a workaround by focusing on the neutral base itself. Strong linear correlations of experimental pKadata were obtained with the molecular electrostatic potential, naturel valence atomic orbital energy, Shannon entropy, Fisher information, Ghosh-Berkowitz-Parr entropy, information gain,Onicescu information energy, relative Rényi entropy and others.We have generalized our previous work by combining both CDFT and ITA descriptors and generating more accurate and robust models to predict both molecular acidity and basicity. Our present results confirm once again that simple density functionals from the information-theoretic approach are well defined and properly behaved quantities that can be widely employed to appreciate and quantify such chemical concepts as acidity and basicity, among many others. The results obtained from this work can be straightforwardly applied to many other molecular systems of the same nature and beyond.
Supporting Information:available free of chargeviathe internet at http://www.whxb.pku.ed