# Approximating Kinetic Energy Density Functionals Via Neural Networks

**Julia Woodward and Samuel Minkov-Temis Highland Park High School, Highland Park, NJ**

**Peer Reviewer: Sarah Tang**

**Professional Reviewer: Michele Pavanello**

**ABSTRACT**

The prime focus of this paper is to utilize machine learning to obtain an accurate approximation of the kinetic energy of a system of electrons expressed as a

**INTRODUCTION**

Scientists have long been studying quantum mechanics, the area of physics that studies the movement and properties of subatomic particles, such as electrons. Yet little progress has been made in determining the relationship between the kinetic energy of a system of electrons and the electron density function. The popular electronic structure method, Kohn-Sham Density Functional Theory (KS-DFT), becomes too computationally expensive when applied to model large systems composed

**Introduction to DFT**

The motivating factor in discovering the exact relationship between electron density and kinetic energy density functionals lies in the quantum mechanical equation that sums together the types of energy that electrons possess. The total energy of a system is represented through the following equation:

where ? (r) represents the electron density proportional to the probability of finding an electron at point r in space. The notation *F[ *? *] *means that the quantity *F *is a functional of the density: *F *is a number that is obtained from considering the density function over all space. *E*_{xc}*[ *? *] *denotes all the many-body quantum electron interaction energy, *E*_{H}*[ *? *] *represents the classical electron-electron repulsion energy, and *E*_{eN}*[ *? *] *signifies the electron attraction to the nuclei [1,2,3]. The subject of the project was analyzing *T*_{s}*[ *? *]*, which is the Kinetic Energy Density Functional (KEDF) of a set of noninteracting electrons having electron density ? *(r)*. This is the only functional of the equation for which an accurate approximation remains unknown [6]. Upon deeper scientific analysis, the KEDF can be approximated through its components:

in which *T*_{LDA}*[ *? *] *is the local density approximation discovered by Thomas and Fermi, that is, the kinetic energy contribution at point *r *is approximated through only the density at that point *r *[8]. *T*_{vW}*[ *? *] *is the von Weizsacker functional, formulated through the electron density and its gradient [9]. The nonlocal KEDF, *T*_{NL}*[ *? *] , *is a function of multiple densities evaluated at different points in space. It is convenient to evaluate this energy functional in reciprocal space by exploiting the convolution theorem through the use of a Fourier transform. The form of the nonlocal KEDF is however unknown. In this case, the neural network serves to learn this previously unattainable relationship among densities and potentials.

In technical terms, potentials are related to the functionals. They are functional derivatives of energy functionals:

When there is a variation in ? (*r*) at a point in space, there is variation in the value of the functional *F*.

**The Kinetic Energy Density Functional, Potential, and Kernel.**

** **The Coulomb repulsion potential at a point *r *of interest due to an electron at *r’ *can be derived from the equation of a point charge:

where k is Coulomb’s constant and Q is the electron’s charge. Over a field of several charges, this takes on the form

This function can be represented as an integral with respect to the distance of the particle from the point of interest, giving the following representation of the potential of an electron density ? (*r*) ? :

This can be written in terms of an integral kernel *w(r,r’) *as

with *w(r,r’) *= *k/|r - r’| *. This is an illustration of a potential as a functional of the electron density in a system, and the kernel is known.

In the context of the case of interest, the mathematical relationship between the nonlocal KEDF potential and the electron density can be expressed as

in which ^{? }means proportional to, *V ^{T}NL^{(r) }*

^{represents the nonlocal KEDF in real space, }

^{w }^{( | }

^{r }^{? }

^{r}^{?| ) }is a kernel that relates the two observables, and ?

*( r*?

*)*is the density. The integral, challenging to evaluate because of the infinite integral it contains (for every point of r), calls upon the Fourier transform for a solution. The convolution property of the Fourier transform enables this relationship to be simplified into a single ratio in G-space:

Where ? * ^{5/6}( G ) *is the Fourier transform of ?

*. The Fourier transform converts the position properties of the potential into momentum (reciprocal space) properties [10]. Reciprocal space (G-space) represents the spatial frequency, a measure of how often sinusoidal components of the wave repeat, of the particle according to de Broglie’s hypothesis. This spatial frequency represents the particle’s momentum.*

^{5/6}(r)**METHOD AND CODE DEVELOPMENT**

**The Code Structure**

The initial code designed a method to relate the kinetic energy potential and the electron density. The code introduced the notion of a cubic cell, for computational purposes. Space is discretized into 50 equally spaced points in each cardinal direction for a total of 503points. Two Aluminium atoms were included in the cell. Using KS-DFT, a data set comprising of 100 distinct densities *(r) *and potentials, *v(r)*, was generated by separate KS-DFT simulations carried out ? with Quantum ESPRESSO [11]. The cube generated 100 different pairs of positions for the atoms, and each position yielded 125,000 specific electron densities *( r _{i}) *and 125,000 kinetic ? energy potential values

*V*i

^{T}NL^{(r}^{) }^{(one for each grid point }

i

^{r}^{) obtained from KS-DFT. }In the code, three different solvers were used to relate densities and potentials. Stochastic Gradient Descent, or SGD, reaches the local minimum of a cost function much more rapidly than the more standard gradient descent iterative method, due to the fact that each gradient descent step is calculated according to a randomly shuffled, smaller batch of data [12]. The Adam solver used is an extension on the SGD solver as it implements Root Mean Square Propagation in the back-propagation of the network as well as an adaptive gradient [13]. The third solver, LBFGS (Limited-memory Broyden-Fletcher-Goldfarb-Shanno), is the most common. LBFGS calculates its result using an approximate Hessian matrix, or

All results below are given in atomic units (*e*^{2 }= *h *= *m _{e }*= 1).

**The First Attempt: Dimensionality problems**

The first version of the code utilizes two different machine learning methods: kernel ridge regression and neural networks. The former uses a linear least-squares-fit technique in which different ways (kernels) of fitting the data were specified. The neural network yielded an approximation that was deemed insufficient compared to other approximations. This was due to a dimensional

**Second Attempt: Nonlocal kernel**

The kernel of the KEDF (Eqs. 8 and 9)

where *fft *is the Fourier transform. Once again, the data needed to be flattened from 3D to 1D. Contrary to the potential, the kernel is much less dependent on the nuclear positions (it is an effective interaction), and thus expected to be more “spherical”. To represent the data using a 1D array, all the data is combined into a new list. However, after training the data using machine learning, poor results were yielded (Figure 1) even after using a large training: testing ratio (80:20).

**The Third Attempt: Final code**

The most successful results stemmed from using a simpler, more straightforward approach. Instead of generating a kernel, the densities were compiled into a 1d array, 12,500,000 data points in length. This array consisted of data from all 100 files, each of which had 125,000 data points from the simulation cell. The same was done for the KEDF potentials. Because the computer could not intake more than 10,000 data points at a time (the computational cost was too large), the data were divided into smaller pieces, each of which was tested in a different trial.

After splitting up the data into training data and testing data with a ratio of 80:20, the machine learning procedure involves specifying parameters, plotting a mean error graph, and fitting the lowest point on the graph to the test data. The first parameter tested was the number of hidden layers, set at 8, 10, and 12. The three solvers: Adam, LBFGS, and SGD, acted as optimizers on different sizes of datasets. Each combination of a solver and associated parameters produced *.fit *function fit the model to the reference and then plotted the reference and the predicted output for each solver (generating three graphs with three solvers).

**RESULTS**

In the graphs reported in

Trial 1: All the training and testing data are taken from the nonlocal potential of the first 10000 points of the first file. Adam and LBFGS were the most consistent throughout. See Figure 2.

Trial 2: The data were taken using the total potential instead of the nonlocal, so it includes the contributions from *T** _{LDA }*and

*T*

*. Due to the linearity of the relationship amongst local potentials and local densities, the data is much less noisy and easier to learn. See Figure 3.*

_{vW}Trial 3: Using the nonlocal potential, the testing data were taken from the first 100 points of the first 80 files. The training data were taken from the first 100 points of the last 20 files. In this trial, the parameters for the gradient step-size (alpha ?) were modified to prevent over-fitting. This is a common strategy and alpha ? is known as a hyperparameter of the model. Every solver generated a scattered result, with no clear correlation between density and potential. See Figure 4.

**CONCLUSION**

The code successfully developed a model based on neural networks that maps the density of an electron system to the corresponding potential. Out of the three solvers tested, LBFGS yielded the most accurate approximation. Unlike SGD and Adam, alterations to the alpha hyper-parameter or the number of hidden layers did not alter the model obtained via LBFGS optimization

Because the nonlocal potential is at least a two-point function of the electron density (the nonlocal potential lies in a higher dimension than the density), it was harder to correlate the nonlocal potential with the density. This is also demonstrated through Equation 8, which shows that multiple densities are needed to calculate *V _{NL}*. When the nonlocal potential was plotted against the density, the graphs showed a distribution with a large variance (see Trial 1 and 2). On the other hand, the relationship between the total kinetic potential and the density was one-to-one, giving a cleaner approximation and much less scattering of data points (see Trial 2, Figure 3).

Future methods of approximating nonlocal Kinetic Energy Density Functionals could lie in modifying the implementation of the Fourier transform of the kernel. The largest conclusion we derive from the graphs suggests the positive correlation between electron density and the total kinetic energy potential. The latter is understood on the basis of a classical argument: the total energy of the system is the sum of the potential energy and the kinetic

Future work will involve additional investigations with the currently developed code. Implementing the code on more powerful computers than our own laptops (such as the Rutgers high performance computing facility, Amarel) will enable the network to train much larger datasets. This will make the network even more predictive and applicable to systems outside the training set. Further, we will apply the resulting network to predict the electronic structure of metals and semiconductors and chemical reactions at their surfaces.

** Acknowledgements: **We are grateful to Professor Michele Pavanello and Dr. Wenhui Mi, Department of Chemistry, Rutgers University-Newark for their guidance of this project and teaching us about machine learning as it applies to Density Functional Theory.

**References:**

[1] P. Hohenberg and W. Kohn, *Inhomogeneous electron gas*, Physical Review vol. 136, pp. B864 - B871 (1964).

[ 2 ] W. Kohn and L.J. Sham, *Self-consistent equations including exchange and correlation effects, *Physical Review vol.140, pp. A1133-A1138 (1965).

[3] W. Kohn, *Nobel Lecture: Electronic structure of matter - wave functions and density functionals*, Reviews of Modern Physics vol.71, pp.1253-1266 (1999).

[4] B.Ricks and D.Ventura, *Training a Quantum Neural Network, *Proceedings of the 16th International Conference on Neural Information Processing Systems, pp. 1019-1026 (2003).

[5] H. Pratt, B. Williams, F. Coenen, and Y. Zheng, *FCNN: Fourier Convolutional Neural Networks*, Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2017. Lecture Notes in Computer Science, vol 10534 (2017).

[6] V. V. Karasiev and S. B. Trickey, *Frank discussion of the status of ground-state orbital-free DFT, *in *Advances in Quantum Chemistry*,pp. 221–245 (Elsevier BV, 2015).

[7] W. Mi, Alessandro Genova, Michele Pavanello, *Nonlocal kinetic energy functionals by functional integration*, Journal of Chemical Physics vol. 148, pp. 184107 (2018).

[8] N.H. March *The Thomas-Fermi approximation in quantum mechanics, Advances in Physics*,**6**:21, pp. 1-101, (1957).

[9] R. M. Dreizler and E. K. U. Gross, *Density Functional Theory: An Approach to the Quantum Many-Body Problem*(Springer, Berlin, 1990).

[10] - D.H. Bailey and P.N. Swarztrauber, *A Fast Method for the Numerical Evaluation of Continuous Fourier and Laplace Transforms*,

[11] - P. Giannozzi et al., *Advanced capabilities for materials modelling with Quantum ESPRESSO*, Journal of Physics: Condensed Matter vol 29, pp. 46 (2017).

[12] - Robbins, Herbert E.. “A Stochastic Approximation Method.” (2007).

[13] Diederik P. Kingma, Jimmy Ba,*Adam: A Method for Stochastic Optimization*, International Conference on Learning Representations, arXiv:1412.6980 (2015)

[14] - Liu, D.C. & Nocedal, J. Mathematical Programming (1989) 45: 503. https://doi.org/10.1007/BF01589116