Approximating Kinetic Energy Density Functionals Via Neural Networks
Authors: Julia Woodward and Samuel Minkov-Temis
Peer Reviewer: Sarah Tang
Professional Reviewer: Michele Pavanello
The prime focus of this paper is to utilize machine learning to obtain an accurate approximation of the kinetic energy of a system of electrons expressed as a functional of the electron density. The kinetic energy density functional is an essential element in orbital-free density functional theory, a method used to study the electronic structure of large systems (composed of millions of atoms) as opposed to Kohn-Sham density functional theory (whose range of applicability is only up to a few thousand atoms). The kinetic energy as a density functional is unknown and current available approximations are inaccurate. Further, an important derived quantity is the kinetic energy potential (the functional derivative of the energy) which to date has never been approximated by machine learning methods. This paper presents the implementation of neural network kinetic energy potentials for a system composed of two aluminum atoms in a cubic cell with crystal periodic boundary conditions. At every point in the cell there exists a unique electron density and kinetic energy potential. In approximating these, three different sets of data were learned: the total potentials, the nonlocal potentials, and the Fourier transform of the kernel (a function of the densities and nonlocal potentials). The results show that the total potential is easier to learn than the nonlocal potential. This is because the nonlocal potential is at least a two-point function of the electron density (similar to convolutions), and neural networks are known to provide inefficient models to such functions.
Scientists have long been studying quantum mechanics, the area of physics that studies the movement and properties of subatomic particles, such as electrons. Yet little progress has been made in determining the relationship between the kinetic energy of a system of electrons and the electron density function. The popular electronic structure method, Kohn-Sham Density Functional Theory (KS-DFT), becomes too computationally expensive when applied to model large systems composed by thousands or millions of atoms (1, 2, 3). Being able to model such large systems is very important because it would result in rational materials design without the need to resort to costly experiments. Although the Hohenberg-Kohn theorem of Density Functional Theory (DFT) states that the kinetic energy of a system can be calculated from the electron density, the exact correlation between the two quantities remains unknown (1). This project aims to tackle the problem using neural networks, a method of machine learning that exercises pattern recognition (4). These networks learn the relationship between the input and output, in this case being the electronic densities and the kinetic energies, respectively. After generating a set of electron densities and associated kinetic energies and potentials (see next section for definitions), a Python software on Jupyter notebooks was developed to train a neural network on this set. The algorithm was tested using a portion of the input data, and the predicted output test data was then compared to the reference output test data. Specifically, the code uses convolutional neural networks, which give inputs weights, and then feed them into artificial neurons that convolve the inputs and produce an output by filtering the weights [5,7]. Neural networks have been successful in image recognition, text-to-speech translation, and logical problem-solving techniques. The hope is that they will be useful in mapping the relationship between densities and potentials.
Introduction to DFT
The motivating factor in discovering the exact relationship between electron density and kinetic energy density functionals lies in the quantum mechanical equation that sums together the types of energy that electrons possess. The total energy of a system is represented through the following equation:
E[ p (r) ] = Ts[ p (r) ] + Exc[ p(r) ] + EH[ p (r) ] + EeN[ p (r) ] (5)
where p (r) represents the electron density proportional to the probability of finding an electron at point r in space. The notation F[ p ] means that the quantity F is a functional of the density: F is a number that is obtained from considering the density function over all space. Exc[ p ] denotes all the many-body quantum electron interaction energy, EH[ p ] represents the classical electron-electron repulsion energy, and EeN[ p ] signifies the electron attraction to the nuclei [1,2,3]. The subject of the project was analyzing Ts[ p ], which is the Kinetic Energy Density Functional (KEDF) of a set of noninteracting electrons having electron density p (r). This is the only functional of the equation for which an accurate approximation remains unknown . Upon deeper scientific analysis, the KEDF can be approximated through its components:
Ts[ p] = TLDA[ p ] + TvW[ p ] + TNL[ p] (6) in which TLDA[ p ] (6)
is the local density approximation discovered by Thomas and Fermi, that is, the kinetic energy contribution at point r is approximated through only the density at that point r . TvW[ p ] is the von Weizsacker functional, formulated through the electron density and its gradient . The nonlocal KEDF, TNL[ p ] , is a function of multiple densities evaluated at different points in space. It is convenient to evaluate this energy functional in reciprocal space by exploiting the convolution theorem through the use of a Fourier transform. The form of the nonlocal KEDF is however unknown. In this case, the neural network serves to learn this previously unattainable relationship among densities and potentials.
In technical terms, potentials are related to the functionals. They are functional derivatives of energy functionals:
When there is a variation in p (r) at a point in space, there is variation in the value of the functional F.
The Kinetic Energy Density Functional, Potential, and Kernel.
The Coulomb repulsion potential at a point r of interest due to an electron at r’ can be derived from the equation of a point charge:
where k is Coulomb’s constant and Q is the electron’s charge. Over a field of several charges, this takes on the form
This function can be represented as an integral with respect to the distance of the particle from the point of interest, giving the following representation of the potential of an electron density p(r') :
This can be written in terms of an integral kernel w(r,r’) as
with w(r,r’) = k/|r - r’| . This is an illustration of a potential as a functional of the electron density in a system, and the kernel is known.
In the context of the case of interest, the mathematical relationship between the nonlocal KEDF potential and the electron density can be expressed as
in which means proportional to, VTNL(r) represents the nonlocal KEDF in real space, w( | r - r' | ) is a kernel that relates the two observables, and p ( r' ) is the density. The integral, challenging to evaluate because of the infinite integral it contains (for every point of r), calls upon the Fourier transform for a solution. The convolution property of the Fourier transform enables this relationship to be simplified into a single ratio in G-space:
Where p 5/6( G ) is the Fourier transform of p 5/6(r). The Fourier transform converts the position properties of the potential into momentum (reciprocal space) properties . Reciprocal space (G-space) represents the spatial frequency, a measure of how often sinusoidal components of the wave repeat, of the particle according to de Broglie’s hypothesis. This spatial frequency represents the particle’s momentum.
METHOD AND CODE DEVELOPMENT
The Code Structure
The initial code designed a method to relate the kinetic energy potential and the electron density. The code introduced the notion of a cubic cell, for computational purposes. Space is discretized into 50 equally spaced points in each cardinal direction for a total of 503 points. Two Aluminium atoms were included in the cell. Using KS-DFT, a data set comprising of 100 distinct densities (r) and potentials, v(r), was generated by separate KS-DFT simulations carried out p with Quantum ESPRESSO . The cube generated 100 different pairs of positions for the atoms, and each position yielded 125,000 specific electron densities (ri) and 125,000 kinetic p energy potential values VTNL(ri) (one for each grid point ri) btained from the KS-DFT. In the code, three different solvers were used to relate densities and potentials. Stochastic Gradient Descent, or SGD, reaches the local minimum of a cost function much more rapidly than the more standard gradient descent iterative method, due to the fact that each gradient descent step is calculated according to a randomly shuffled, smaller batch of data . The Adam solver used is an extension on the SGD solver as it implements Root Mean Square Propagation in the back-propagation of the network as well as an adaptive gradient . The third solver, LBFGS (Limited-memory Broyden-Fletcher-Goldfarb-Shanno), is the most common. LBFGS calculates its result using an approximate Hessian matrix, or second order derivatives .
All results below are given in atomic units (e2 = h = me = 1).
The First Attempt: Dimensionality problems
The first version of the code utilizes two different machine learning methods: kernel ridge regression and neural networks. The former uses a linear least-squares-fit technique in which different ways (kernels) of fitting the data were specified. The neural network yielded an approximation that was deemed insufficient compared to other approximations. This was due to a dimensional error, because commonly available machine learning software requires data arranged as a rank 1 tensor (e.g., vectors) while the natural representation of the electron density is a function of three dimensional space. Collapsing the data into a 1D array may have caused a loss in the critical information that maps the physical space.
Second Attempt: Nonlocal kernel
The kernel of the KEDF (Eqs. 8 and 9) was defined in the code with the following Python function:
Rho_16 = Rho**(1.0/6.0)
Rho_56 = Rho**(5.0/6.0)
kernel = np.real(kernel)
Kernel = Kernel_1(density, potential_NL)
where fft is the Fourier transform. Once again, the data needed to be flattened from 3D to 1D. Contrary to the potential, the kernel is much less dependent on the nuclear positions (it is an effective interaction), and thus expected to be more “spherical”. To represent the data using a 1D array, all the data is combined into a new list. However, after training the data using machine learning, poor results were yielded (Figure 1) even after using a large training:testing ratio (80:20).
The Third Attempt: Final code
The most successful results stemmed from using a simpler, more straightforward approach. Instead of generating a kernel, the densities were compiled into a 1d array, 12,500,000 data points in length. This array consisted of data from all 100 files, each of which had 125,000 data points from the simulation cell. The same was done for the KEDF potentials. Because the computer could not intake more than 10,000 data points at a time (the computational cost was too large), the data were divided into smaller pieces, each of which was tested in a different trial.
After splitting up the data into training data and testing data with a ratio of 80:20, the machine learning procedure involves specifying parameters, plotting a mean error graph, and fitting the lowest point on the graph to the test data. The first parameter tested was the number of hidden layers, set at 8, 10, and 12. The three solvers: Adam, LBFGS, and SGD, acted as optimizers on different sizes of datasets. Each combination of a solver and associated parameters produced a score, calculated through a correlation coefficient. A range of parameters were scanned through for each solved, and their scores were plotted on mean error graphs. The lowest point on the mean error graph for each solver correlated to specific parameters that were manually selected for the machine learning of our data. The .fit function fit the model to the reference and then plotted the reference and the predicted output for each solver (generating three graphs with three solvers).
In the graphs reported in Figure 2 and 3, the orange shape represents the reference data, and the blue represents the prediction the neural network generates. The top plot shows the Adam solver, the second shows LBFGS, and the third is SGD.
Trial 1: All the training and testing data are taken from the nonlocal potential of the first 10000 points of the first file. Adam and LBFGS were the most consistent throughout. See Figure 2.
Trial 2: The data were taken using the total potential instead of the nonlocal, so it includes the contributions from TLDA and TvW. Due to the linearity of the relationship amongst local potentials and local densities, the data is much less noisy and easier to learn. See Figure 3.
Trial 3: Using the nonlocal potential, the testing data were taken from the first 100 points of the first 80 files. The training data were taken from the first 100 points of the last 20 files. In this trial, the parameters for the gradient step-size (alpha ) were modified to prevent over-fitting. This is a common strategy and alpha is known as a hyperparameter of the model. Every solver generated a scattered result, with no clear correlation between density and potential. See Figure 4.
The code successfully developed a model based on neural networks that maps the density of an electron system to the corresponding potential. Out of the three solvers tested, LBFGS yielded the most accurate approximation. Unlike SGD and Adam, alterations to the alpha hyper-parameter or the number of hidden layers did not alter the model obtained via LBFGS optimization.This means that LBFGS is more robust than SGD or Adam, which are more commonly used (due to less memory usage). The main issues that resulted from SGD and Adam were in their tendency to overfit the neural network model (i.e., some neuron weights become too large, making the network too stiff). The issue of over-fitting in Adam and SGD prohibited the network from producing valid results.
Because the nonlocal potential is at least a two-point function of the electron density (the nonlocal potential lies in a higher dimension than the density), it was harder to correlate the nonlocal potential with the density. This is also demonstrated through Equation 8, which shows that multiple densities are needed to calculate VNL. When the nonlocal potential was plotted against the density, the graphs showed a distribution with a large variance (see Trial 1 and 2).
On the other hand, the relationship between the total kinetic potential and the density was one-to-one, giving a cleaner approximation and much less scattering of data points (see Trial 2, Figure 3). Future methods of approzimating nonlocal Kinetic Energy Density Functionals could lie in modifying the implementation of the Fourier transform of the kernel. The largest conclusion we derive from the graphs suggests the positive correlation between electron density and the total kinetic energy potential. The latter is understood on the basis of a classical argument: the total energy of the system is the sum of the potential energy and the kinetic energy, and is conserved throughout the system. Where the potential energy is lower, the electron density and kinetic energy is higher. The results show that this is also true in quantum many-electron systems: in regions of higher electron density, the kinetic energy is larger.
Future work will involve additional investigations with the currently developed code. Implementing the code on more powerful computers than our own laptops (such as the Rutgers high performance computing facility, Amarel) will enable the network to train much larger datasets. This will make the network even more predictive and applicable to systems outside the training set. Further, we will apply the resulting network to predict the electronic structure of metals and semiconductors and chemical reactions at their surfaces.
Acknowledgements: We are grateful to Professor Michele Pavanello and Dr. Wenhui Mi, Department of Chemistry, Rutgers University-Newark for their guidance of this project and teaching us about machine learning as it applies to Density Functional Theory.
 P. Hohenberg and W. Kohn, Inhomogeneous electron gas, Physical Review vol. 136, pp. B864 - B871 (1964).
[ ] W. Kohn and L.J. Sham, Self-consistent equations including exchange and correlation 2 effects, Physical Review vol. 140, pp. A1133 - A1138 (1965).
 W. Kohn, Nobel Lecture: Electronic structure of matter - wave functions and density functionals, Reviews of Modern Physics vol. 71, pp. 1253 -1266 (1999).
 B. Ricks and D. Ventura, Training a Quantum Neural Network, Proceedings of the 16th International Conference on Neural Information Processing Systems, pp. 1019-1026 (2003).
 H. Pratt, B. Williams, F. Coenen, and Y. Zheng, FCNN: Fourier Convolutional Neural Networks, Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2017. Lecture Notes in Computer Science, vol 10534 (2017).
 V. V. Karasiev and S. B. Trickey, Frank discussion of the status of ground-state orbital-free DFT, in Advances in Quantum Chemistry, pp. 221–245 (Elsevier BV, 2015).
 W. Mi, Alessandro Genova, Michele Pavanello, Nonlocal kinetic energy functionals by functional integration, Journal of Chemical Physics vol. 148, pp. 184107 (2018).
 N.H. March The Thomas-Fermi approximation in quantum mechanics, Advances in Physics, 6:21, pp. 1-101, (1957).
 R. M. Dreizler and E. K. U. Gross, Density Functional Theory: An Approach to the Quantum Many-Body Problem (Springer, Berlin, 1990).
 - D.H. Bailey and P.N. Swarztrauber, A Fast Method for the Numerical Evaluation of Continuous Fourier and Laplace Transforms, SIAM J. on Scientific Computing, vol. 15, no. 5, pp. 1105 -1110 (1994).
 - P. Giannozzi et al., Advanced capabilities for materials modelling with Quantum ESPRESSO, Journal of Physics: Condensed Matter vol 29, pp. 46 (2017).
 - Robbins, Herbert E.. “A Stochastic Approximation Method.” (2007).
 Diederik P. Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization, International Conference on Learning Representations, arXiv:1412.6980 (2015)
 - Liu, D.C. & Nocedal, J. Mathematical Programming (1989) 45: 503. https://doi.org/10.1007/BF01589116