Abstract
This paper presents a test of the generalized NFW (Navarro-Frenk-White) profile, which adds parameters to the original NFW profile to be more flexible and account for different density slopes in the halo. One of the most interesting parameters added is the inner slope parameter, which describes the steepness of dark matter density changes approaching the center of haloes. We use an advanced Markov-Chain Monte Carlo (MCMC) library called NumPyro to fit the models onto a database of galactic rotation curves provided by SPARC and determine the relative goodness of fit using Bayesian Information Criterion (BIC) and Akaike Information Criterion (AIC) values. We then organize the posterior probability distributions in a 2-dimensional corner plot and analyze the resulting parameter values. We find that for a complex model, the generalized NFW performs worse than the NFW profile in relation to its complexity, with NFW fitting 57% of galaxies better than its counterpart, with 43%. However, we find an interesting negative correlation between the inner slope values generated by the generalized NFW posteriors and their corresponding log masses, showing that the generalized NFW predicts a decrease in the inner dark matter density slope as the log mass of galaxies increases.
Introduction
Astronomers have long been able to measure the properties of galaxies. Size, distance from Earth, mass, and rotational velocity are all determining factors that can give a galaxy its identity. Unfortunately, the galaxy’s mass and rotational velocity never seemed to add up, as the observed galaxies looked to be rotating too fast in relation to their mass. Scientists have explained this phenomenon by postulating the presence and effect of a new type of unobservable mass-adding matter.1 This was the birth of dark matter as we know it, a theoretical substance that does not interact with light.
In the current understanding of the cosmos, the CDM model, which is used to mathematically model the widely accepted Big Bang theory, acknowledges the existence of dark matter as Cold Dark Matter (CDM). Modern-day astronomers and astrophysicists try to determine the distribution of dark matter across spiral axisymmetric galaxies to see how it affects their rotation. This can lead to many discoveries about its properties and can help us pinpoint what it is exactly. Astrophysicists compare two methods of research, astronomical observations of galactic rotation curves, and cosmological simulations, in which we create a particulate system following select laws of physics to predict the behavior of dark matter clusters in the universe and their distribution.
Here we run into another problem, the density profiles of dark matter don’t match. Known as the cusp-core or cuspy-halo problem, Astronomers find a less dense, flat region in the center of galaxies (core), while cosmological simulations indicate a steep power-law distribution at the centers(cusp)2. The differences are seen in rotation curve graphs showing a constant increase in velocity throughout the galaxy for cores and a plateau after the galaxy center for cusps (fig 1).
![](https://nhsjs.com/wp-content/uploads/2024/12/Figure-1-1.jpg)
Solutions to this cusp-core problem are currently in debate: either to propose particles as CDM candidates, reason the presence of self-interacting (warm) dark matter, or measure the effects of baryonic matter on dark matter distribution.
This paper tests the generalized NFW profile, a new addition to the list of profiles, on an extensive data set provided by SPARC (Spitzer Photometry & Accurate Rotation Curves), which provides rotational velocity data on 175 different galaxies3.
SPARC
The Spitzer Photometry & Accurate Rotation Curves database aims to advance measurements pertaining to galactic rotation curves in order to further research on dark matter Halos. Their data, collected through HI gas and optical observations, provides detailed models about the mass distribution of galaxies. The coherent movement of the HI disk allows for the ascertainment of dark matter properties within late-type spiral and irregular galaxies4. For instance, the data provided by SPARC has led to further development of Renzo’s rule, which shows the direct relation between features in a galaxy’s luminosity profile and its rotation curve. These studies are particularly interesting because they demonstrate SPARC’s findings with observational studies using measurements from powerful telescopes and gathering data on dark matter, holding implications for future studies.
Generalized NFW
The generalized NFW is a modification of the original NFW profile, which is widely used for its accuracy with galaxies that have greater masses in their center. However, it is less accurate with low central mass galaxies, thus the generalized NFW incorporates a parameter representing the inner slope of the Dark Matter rotation curve. The inner slope means how steeply the density of dark matter decreases as the radius increases past the galaxy center.
There are other competing proposals to explain the cusp-core problem, including popular models like the Einasto profile, generally used for spherically symmetric density distributions, and the Burkert profile, which provides a core-like central distribution trying to explain dark-matter dominated dwarf galaxies. In other cases, the problem is addressed by proposing particles. An example candidate particle given through statistical comparison with SPARC’s database is called Fuzzy Dark matter (FDM), characterized by its extremely low mass and non-negligible quantum pressure. The fact that FDM is self-interacting allows it to avoid overdensities and form a kind of core seen in many SPARC galaxies4.
Among the various profiles proposed, the generalized NFW model tries to predict flat cores simply by adding parameters to the original model. In a study comparing the generalized NFW model to another variant of NFW called coreNFW, researchers found that the generalized model works very well5. The subject of this paper revolves around the generalized NFW, which we will compare with its original NFW counterpart to determine its preferability to the SPARC galaxies, which have a much larger dynamic range of galaxy masses than those used in the previous study. To do this, we will conduct parametric fitting with a Python-based Markov-Chain Monte Carlo (MCMC) algorithm to match the models to each galaxy and determine its preferred model. This memoryless process uses random sampling to transition from one parameter value to another, assessing the quality of the resulting profile6. MCMC is based on Bayesian inference, which is the process of correcting a prior belief or estimate after receiving data. In this case, we are modifying the original dark matter profiles by finding parameter values according to the SPARC data. MCMC applies this to the profiles to create posteriors of each model by choosing parameter values and determining its fit to the rotation curve data. We use a gradient-based NumPyro library to conduct the MCMC procedure as compared to previous libraries used to experiment on the generalized NFW profile. The code for this process is shown in this GitHub repository.
Methodology
The general method for this study revolves around parametric fitting. This procedure is based on a fundamental idea of data analysis, which states that a model ideally describes (fits) the data with a few parameters. This follows the rule:
(1)
which can help determine a preferred model by finding the one that minimizes error when mapping to the data, while also maintaining simplicity. Parametric fitting allocates variables to the model that can be manipulated on a case-by-case basis called free parameters. Fitting is the procedure that finds the best values of these free parameters such that the outcome is the set of parameter values with minimal error. This parametric fitting is used on the models to match them optimally to each galaxy and measure errors.
Data Cleaning
Fitting the profiles to the raw SPARC data is not advisable as it contains the observed rotation curve of the whole galaxy, baryonic components included. We must first sift through the galaxies and separate the dark matter components. The main method for separating components of galactic rotation curves is derived from Poisson’s equation and sums the component velocities in quadrature. It is shown as:
(2)
where is the observed rotational velocity,
is the velocity of the baryonic components, and
is the velocity of Dark Matter within the galaxy. Thankfully, this is a procedure that has been heavily experimented with, and SPARC has its data stored in separate arrays of
and
, which can be manipulated simply in Python, where we graph the dark matter velocity as the quadratic difference of the observed velocity data shown below and the separated baryonic velocities given by SPARC (Fig. 2).
![](https://nhsjs.com/wp-content/uploads/2024/12/Figure-2-1.jpg)
NFW
One of the most popular Dark Matter profiles, and one that we use as a control in this study, is the Navarro-Frenk-White (NFW) model. This is given by:
(3)
where is the central density of the galaxy and
. Both
and
are fitting parameters. In this case, mass is represented by:
(4)
which is an integration of the mass within the radius , given as the observed distance from the center of the galaxy.
Generalized NFW
This model, also called DC14 or the Zhao profile, adds parameters to the original NFW to account for dependencies in the central cusp and outer regions. The central regions are determined by , and the outer region falls off to an arbitrary
. The NFW is then rewritten as:
(5)
with the added ,
, and
parameters.
Fitting
Parametric fitting with each profile follows three general steps. The first is to set a possible value for each parameter, usually done by iterating through a range of values. With the set values, each model is then mapped to a rotation curve plot with the observed galaxy rotation curves. The last step is to measure the error between each point and the profile’s prediction. Each step in the process is repeated for arbitrary values of each parameter until the error is minimized and the optimal parameters are found.
There are some complications with this method, as there is no template for the values of each parameter, and it can take a lot of iteration to result in acceptable parameter values, and still, it is not assured that the values are optimal. Therefore, we use a high-processing algorithm to perform the fitting. Finally, we fit the posterior model to the rotation curve data given by SPARC using a simple Pyplot graphing feature given in Python’s Matplotlib library.
MCMC
Markov Chain Monte Carlo (MCMC) makes parametric procedures simpler and produces a near-optimal solution every time. Here we use a NumPyro module to generate the posteriors. NumPyro is a small library that allows for the probabilistic programming language Pyro to be used with the popular numerical NumPy library7. The MCMC algorithm is derived from the \texttt{NumPyro.infer} module and uses two methods of Bayesian inference: Monte Carlo sampling in the form of a Markov Chain.
Monte Carlo:
This is a kind of iteration that simplifies the aforementioned fitting procedures by using repeated random sampling instead of iterating through a predefined set of parameter values. This allows us to start with a wide sample set without leaving out possible values.
Markov Chain:
A process where the current iteration is compared with the previous to determine how the posterior is moving in relation to the desired result. It then diagnoses whether the sampled parameter values are more accurate than before. If not, the algorithm takes a step backward to the previous iteration.
Gradients:
Instead of using a form of numerical integration to conduct Monte Carlo, NumPyro’s infer module computes a potential function showing how unlikely it is for a set of parameters to produce the data, then repeatedly evaluates the gradient for that function8. The gradient is the derivative of the potential with respect to displacement, given by:
(6)
which allows \texttt{NumPyro.infer}, specifically NUTS, to produce much more accurate posteriors with much less time.
The combined MCMC step-based procedure using NUTS allows us to start with a large range of possible parameter values, narrowing out as we decrease the discrepancy between model and observation. This gives not only an optimal error but also diminishes sampling error. For this procedure, we walk 1000 steps with MCMC.
Error Calculation
Error calculation within the MCMC algorithm is done using a log-likelihood () function, which transforms a product of densities into a sum9. This is given by:
(7)
with being the number of observations. This shows the sum of the individual probabilities for a given parameter
instead of multiplying them in a maximum likelihood function. In our case, MCMC tries to maximize the log-likelihood of the parameter values producing the data.
BIC & AIC:
We use the Bayesian Information Criterion (BIC) to approximate the marginal probability density of the data and determine the performance of each model. This is because some complex models may overfit to a galaxy, meaning that it corresponds too closely to the galaxy and may not be able to account for additional data or will make inaccurate predictions. BIC is used to measure discrepancy but also attempts to account for overfitting. For each galaxy, the model that produces the smallest BIC is the preferred model, and it can also predict dark matter distribution galaxies accurately. BIC is calculated as:
(8)
for observations where
is the number of parameters and
is the log likelihood.
AIC, or Akaike Information Criterion, is similar to BIC in that it gives a penalty for additional parameters. However, it does not take into account the sample size (number of observations) and instead of finding the true model like BIC, it finds the better approximation to the unknown data. It is given by:
(9)
and is compared the same way: The model with a lower AIC value makes better predictions.
Results
The outcome of this analysis can be split into 3 parts. The first being the fitted rotation curves, where each model was parameterized to the specifications of each galaxy. Then we determine the performance of the generalized NFW model with respect to its higher complexity compared to NFW, for this we compare the BIC and AIC values generated for each galaxy. Finally, we present corner plots to show the posterior probability distributions generated by the MCMC program. Below we have shown the fits of the posterior models of 3 different galaxies, we expect it to show a similar fitting accuracy for the cuspy distributions and a more accurate prediction for the cored NGC3198 density profile (fig 3).
![](https://nhsjs.com/wp-content/uploads/2024/12/Figure-3-1-1024x527.jpg)
Rotation Curves
The fitting was done successfully on 72 galaxies, one for each model, and each with 14 points or more of data sampled 1000 times to fully converge the value (Gelman-Rubin Statistic) to 1. This makes for 144 plots of fitted rotation curves. We show rotation curves that display both cusp and core-like distributions (fig 3).
Model Performance
Simply put, here we compare the BIC and AIC values generated by the posteriors. Each galaxy generates both values for the NFW and generalized NFW profiles. Out of the 72 galaxies, NFW performed better for 41 galaxies (57%) in terms of BIC and 39 galaxies (54%) in terms of AIC, while generalized NFW performed better for 31 galaxies (43%) and 33 galaxies (46%) for BIC and AIC respectively. In terms of relevant difference, only 58 galaxies showed a relevant difference in BIC values for each model. 32 NFW profiles performed better with a BIC difference greater than 2, while generalized NFW performed better for 26 galaxies. For AIC comparisons, 55 galaxies showed a relevant preference, with 29 preferring generalized NFW and 26 preferring the NFW profile. Additionally, of the galaxies where generalized NFW performed better, 16 showed a cored density profile
![](https://nhsjs.com/wp-content/uploads/2024/12/Figure-5-1.jpg)
Corner Plots
We generate corner plots using the variables provided by SPARC and their posterior probability density functions. Within the plot, gives the luminosity observed at
micrometers (
m),
gives the distance to the galaxy in megaparsecs (Mpc),
gives the inclination angle of the galaxy in degrees,
gives the log mass (in solar masses) within the virial radius of the galaxy,
gives the mass-to-light ratio for the stellar disk,
gives the mass-to-light ratio for the gas clouds, and
gives the log concentration within the virial radius.
The main points of interest would be the value showing the inner slope of the rotation curve, and the
Mass value of the dark matter halo alone, both generated by the MCMC procedure.
Discussion
The MCMC procedures perform well for nearly all galaxies and profiles. For 4 galaxies, however, the NFW fitting fails to generate a viable model, this is likely because the galaxy is a dwarf galaxy and creates an invalid model through the incompatible NFW and generalized NFW models.
![](https://nhsjs.com/wp-content/uploads/2024/12/Figure-5a.jpg)
As we can see, the farthest observed rotation curve value is just over 5 kpc in comparison to the normal-sized galaxies with 20-120 kpc radii. For these 4 dwarf galaxies, we see the NFW fail, but the generalized NFW manages to fit with relative accuracy, but a larger dynamic range.
Accounting for the considerable error given in the posterior, we tend to discount this fit as inaccurate. Additionally, we see that each galaxy where NFW fails is not only a dwarf but also forms a core with a somewhat constant slope at an increasing radius.
BIC & AIC implications
It may be unexpected that the generalized NFW performs worse than NFW for more galaxies than it performs better. However, we can reason most of this to overfitting, as the generalized profile has a much higher degree of freedom due to its numerous parameters. This could cause it to make otherwise unnecessary corrections to the fit, as seen in the plot of NGC3521 (fig 3), where the NFW profile has a BIC value of nearly 3 lower than the generalized NFW. These small corrections allow for the generalized profile to be overfitted enough that it is considered significantly worse in performance by BIC standards.
It should not be taken that the generalized NFW is worse overall than the NFW profile. This is because the BIC and AIC values are generated largely with respect to the complexity of the model, and give a larger penalty per added parameter10. While the generalized NFW is quite simple for the models that BIC and AIC can handle, it adds enough extra parameters compared to NFW to raise the value by a significant amount.
Nevertheless, the generalized NFW model performs much better when determining a core. 16 of the 25 galaxies where the BIC value favors generalized NFW are cores. Along with the cored dwarf galaxies, where NFW fails but generalized NFW does not, the BIC analysis shows that generalized NFW does a much better job of predicting Dark Matter cores than NFW.
Posterior Interpretation
At first glance, the probability distributions of the inner slope () parameters vary a lot from galaxy to galaxy. While the predicted
value is expected to be around
to
(Fig. 4), the values generated by the MCMC posteriors range from under
to nearly
. After further analysis, we observe a moderate negative correlation between the inner slope values and their corresponding
Mass. The correlation coefficient for this relation is
with a
-value of
, which indicates that this correlation has very strong statistical significance. As the Mass of the galaxy increases by factors of ten, we observe a general decrease in the
value with a few outliers. However, these outliers can likely be discounted due to the comparatively large error margins. In Fig. 7, we see that the error estimates to
standard deviation are much larger for the outlying galaxies in the relation.
This places the Milky Way right at the center of the graph (around solar masses), so our galaxy’s dark matter halo should, according to the generalized NFW profile, have an inner slope between
to
.
As we know that the generalized NFW did not perform well for a considerable number of galaxies, we can further conclude from this graph the trends in each model’s relative performance in relation to the proposed inner slope.
![](https://nhsjs.com/wp-content/uploads/2024/12/Figure-8.jpg)
From Fig. 8, a similar negative correlation is observed. Although it is not very significant, we can see that generalized NFW is preferred for more galaxies at a steeper inner slope: . We also believe that this relation is specific to the
Mass, and other galaxy properties will not affect the
value (Fig. 9).
![](https://nhsjs.com/wp-content/uploads/2024/12/Figure-9.jpg)
![Rendered by QuickLaTeX.com \gamma](https://nhsjs.com/wp-content/ql-cache/quicklatex.com-914c21c890c92e1de62ff8e3de692f66_l3.png)
We perform the same comparison with luminosity and the posterior inner slope value. It is clear that there is a negligible correlation between the two, and similar to the relation with Mass, generalized NFW shows a higher bound of
values. For a further look into the influence of galaxy
Mass, we have normalized the number of galaxies that prefer each model and compared it with the mass of each galaxy, as we hypothesize that generalized NFW may perform better at specific
Mass values.
In Fig. 10, we see the normalized number of galaxies that prefer one model type over another, in relation to their Mass. We can conclude that more galaxies preferring generalized NFW are of low mass, while NFW is largely preferred at higher mass galaxies.
We repeat this comparison with luminosity to check for the significance of other factors in model preference. We see that aside from a small luminosity window, NFW generally performs better regardless of the luminosity of the galaxy. This indicates that, unlike galaxy mass, luminosity does not affect the preference of the galaxy towards generalized or regular NFW.
Conclusion
This research paper was directed towards testing the generalized NFW profile in a new way, using MCMC to generate fits to 72 SPARC galaxies and testing for model performance with respect to complexity. Additionally, we aimed to learn more about the inner slope parameter within the profile, comparing it to the mass of its respective galaxy. The study was conclusive in determining the performance of the generalized NFW model, coming up with some interesting results, such as its poor performance for its complexity, along with some that were expected, such as its increased proficiency in predicting dark matter cores compared to NFW. For further research studies, we recommend studying the preference of generalized NFW to predict cores over cusps, which could help understand how far it is from accurately predicting flat cores. Furthermore, we recommend using additional modes of analysis to determine the proficiency of the generalized NFW among other models, as the modes used in this study do not completely conclude the accuracy of the model, only its precision. Another limitation of this study was the inability to manipulate the rotation curve data. Due to the immutable nature of SPARC’s observed data, we were unable to freely manipulate the fitting procedures. Instead, we tried to find priors within the database that were similar to the ones we wanted to experiment with.
References
- V. Rubin, W. K. Ford Jr., N. Thonnard, The Rotational Properties of 21 SC Galaxies With a Large Range of Luminosities and Radii, From NGC 4605 (R=4kpc) to UGC 2885 {R=122kpc). The Astrophysical Journal. 238, 471-487 (1980). [↩]
- W. J. G. de Blok, The core-cusp problem. Advances in Astronomy. 2010 (2009). [↩]
- F. Lelli, S. S. McGaugh, J. M. Schombert, Sparc: mass models for 175 disk galaxies with spitzer photometry and accurate rotation curves. The Astronomical Journal. 152, 157 (2016). [↩]
- P. Li, F. Lelli, S. S. McGaugh, J. M. Schombert, A comprehensive catalog of dark matter halo models for SPARC galaxies. The Astrophysical Journal Supplement Series. 247, 31 (2020). [↩] [↩]
- F. Allaert, G. Gentile, M. Baes, Testing baryon-induced core formation in ΛCDM: a comparison of the DC14 and coreNFW dark matter halo models on galaxy rotation curves. Astronomy and Astrophysics. 605 (2017). [↩]
- Columbia University Mailman School of Public Health. Markov Chain Monte Carlo. https://www.publichealth.columbia.edu/research/population-health-methods/markov-chain-monte-carlo (2023). [↩]
- D. Phan, N. Pradhan, M. Jankowiak, Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro. (2019). [↩]
- NumPyro Documentation. Bayesian Regression Using NumPyro. https://num.pyro.ai/en/latest/tutorials/bayesian_regression.html. [↩]
- M. Taboga, “Log-likelihood”, Lectures on probability theory and mathematical statistics. https://www.statlect.com/glossary/log-likelihood. (2021). [↩]
- A. Gelman, J. Hwang, A. Vehtari, Understanding predictive information criteria for bayesian models. (2013). [↩]