A Simulative Comparison of the Robustness of Centralized v. Decentralized Economic Systems Under Various Regulatory Conditions



The structure of an economic system may influence the level of wealth equality among the entities the economy encompasses. In this work, we compare the robustness of centralized and decentralized financial systems by simulating how the progression of individual wealth differs between these economies. In this context, we define robustness as how well the financial system can produce wealth equality (convergence of wealth) across various regulatory environments. We analyze how the net worths for low income, medium income, and high income entities change over time. We evaluate the prevalence of wealth democratization in both economies under various governmental regulations (low, medium, and high), as a highly robust system would be able to withstand extreme as well as moderate regulatory conditions, resulting in generally similar net worthsacross the board.

The overall simulation is developed under a Markov Decision Process framework, where actions and interactions are ultimately random, but are weighted by socioeconomic attributes of the system’s entities. In a decentralized economy, financial interactions and operational decisions are controlled by individuals, rather than by a central middleman. In order to simulate this discrepancy, we vary the role of central firms, adjust transaction fees and interest rates, and modify the frequency and ease of certain interactions, depending on the roles and incentives of the individuals and central firms.

We find that although the decentralized financial structure provides more frequent, systematic opportunities for lower income individuals to raise their respective net worths, the ultimate outcome of individual net worth – based on wealth shifts over time – is nearly identical for both the centralized and decentralized environments.


The goal of this work is to simulate the essence of centralized and decentralized economic systems and evaluate their dynamics in response to various regulatory conditions.

An economy is a system in which people exchange objects of value – in this case, money – through various forms of transactions, such as investing, betting, and loaning1. Typically, an economy is seen as a specific form of dynamical system, which demonstrates the behavior of a set of agents over time2. In this work, we represent dynamical economic systems using a Markov Decision Process (MDP) framework to analyze the flow of behavior of rational economic agents in terms of action-state sequences.

Furthermore, we view wealth inequality as a negative phenomenon, as it implies harm to the prospect of societal equality due to its inherent lack of utilitarianism; utilitarianism is a characteristic present in “heavily redistributive” systems3. Current economic systems have discrimination embedded in them4. We consider a good economic system to be one that results in wealth equality; it should function fairly and equitably, rather than be structurally inhibitive of certain groups of people. A reasonable approach to an egalitarian economic system is to present everyone with equal opportunities, which is referred to as “leveling the playing field”5.

We broadly classify two types of economic systems: centralized and decentralized. A centralized system is characterized by central banks through which lending is conducted6. A decentralized system is characterized by peer to peer (P2P) lending, with firms serving the role of maintaining, but not intervening, in this network7. Today, the standard system we associate with decentralized economic structure is called a blockchain8. However, our current economy is centralized.

Both types of systems may be subject to regulations, which we categorize into low regulation (mirroring a laissez-faire, purely capitalistic economy in which there is no government intervention whatsoever in the form of taxes, antitrust laws, or subsidies), medium regulation (depicting most modern-day systems, which have reasonable taxes, antitrust laws, and government aid to low income individuals and firms), and high regulation (portraying socialist economies with extremely high tax rates, restrictive antitrust and anti-monopolization practices, and a large amount of subsidization).

Today, Web3 technologies such as blockchain are classified as emerging technology, and are being quickly adopted and spread to different parts of our economy, so much so that a decentralized financial system (DeFi) has been created9. As this new, potentially revolutionary, type of economic system is being rapidly developed, it is necessary to evaluate the potential benefits and harms of fully incorporating this technology into our current economy. In this study, we focus particularly on the potential of leveraging decentralized, blockchain technology as a means to solve the pending economic issue of wealth inequality. The objective of this study is to compare centralized and decentralized economic environments, particularly in their ability to systematically drive a model of wealth equality.

We aim to investigate the presence of a predefined set of desirable properties in the context of wealth redistribution among the agents in the system. Each pairing of environment (centralized or decentralized) and condition (low, medium, or high regulation) is evaluated in its ability to maintain wealth equality throughout different regulatory settings, which we select as the core measure of robustness in our study. We then extrapolate insights from these simplified settings to contemporary markets, critically reviewing the economic structures that provide the most monetary benefit to the people in the system in expectation.

The utilization of simulation to validate economic policy is common in both modern and historical societies. For example, testing government tax policy was approached simulatively in 196010. We apply a similar approach, by simulating the potential of blockchain technology compared to the standard centralized economic system. The simulation is constructed as an MDP, using an agent-based approach, which allows for an effective analysis of optimal policies11. Outside influences, such as the behaviors of other agents, can affect the path taken by the agent in question; in our simulation, the interactions happen directly between agents, firms, and regulators, so the nature of the interaction is impacted primarily by the net worths of each agent involved.

Specific to capital flow, there are several factors involved in financial interactions. For example, in the context of firm policy, loan validity is often checked based on creditworthiness, which is calculated using consumers’ financial data12. We implement a simplified version of this firm policy in our simulation. By calculating an upper and lower bound to the loan that a consumer can request (which is based on the consumer’s net worth), the firms in our simulation base their policy on lending capital and individual net worth functions. For financial interactions occurring directly between individual agents, bets only occur between individuals with similar net worths, to mirror a realistic betting situation that is based on both agents’ monetary incentives.

Furthermore, we adopt market selection (eliminating relatively inefficient firms) and entry of new firms in the context of competition-enhancing firm policies; firms can either combine in the process of a merger, or one firm can split into two firms in the process of a spinoff13.

Furthermore, we classify the regulating agent as the government (which we define to be a simplified version of governing bodies like the Federal Reserve and U.S. government), who implements oversight and control through taxation on individuals and firms, implementation of antitrust laws, distribution of economic subsidies to low income individuals and firms, and placement of limits on maximum interest rates and transaction fees (the extent of all these regulations varies based on the regulatory environment). In the decentralized world, regulation is directed towards middlemen in the transactions14, which we partially implement; the government regulates both the blockchain network on which the transactions are taking place, as well as the lending agent carrying out the transactions. These regulations are implemented at varying extents to best mirror a broader spectrum of economies. The regulations represent an important facet of capital flow, as they impact the activities of central firms (through antitrust laws and taxation) and individuals (through taxation and subsidies), which could cause ripple effects in other transactions.

We posit that the centralized simulation will yield a small number of agents that hold the highest net worth compared to the rest, while the majority of the individual agents will have net worths less than the mean. We assume higher income individuals possess a higher risk appetite, meaning they invest more money into the bet. They also would likely qualify for greater loans, which would temporarily increase their income, eventually cycling back into betting. Furthermore, for the decentralized portion of the simulation, we posit that each agent will have a net worth closer to the mean (rather than there being a few extremely wealthy individuals). The role of firms in the decentralized environment may, at one point, become effectively obsolete, implying that firms will lose money to taxation without gaining substantial new income through loans (as loaning will occur only between agents). Individual agents could keep the capital flow between themselves and the government. Furthermore, since the fees and rates are decided by the individuals partaking in the system, we presume they are likely to be lower.


Figure 1| Plot generated by Matplotlib that shows change in net worth of individual agents (split in 3 lines based on initial subtype) as time progresses, in variations of centralized v. decentralized environment with low v. medium v. high regulatory conditions
Figure 2| Plot generated by Matplotlib that shows change in net worth of firm agents as time progresses, in variations of centralized v. decentralized environment with low v. medium v. high regulatory conditions
Figure 3| Plot generated by Matplotlib that shows change in net worth of regulator (government) agent as time progresses, in variations of centralized v. decentralized environment with low v. medium v. high regulatory conditions

Each of the three plots combines data from fifty iterations of the simulation, each covering 800 timesteps.

Figure 1 shows agent net worth over time, with six subplots: three of the centralized system and three of the decentralized system, each having a low regulation, middle regulation, and high regulation setup. Each of the six subplots visualizes data pertaining to three subtypes of agents (low net worth, medium net worth, and high net worth individuals) based on their initial wealth. In the centralized, low-regulation subplot, the high net worth individuals grow their net worth at an exponential rate (the subtype’s collective net worth increases by factor of 5, with an initial starting point of approximately 1e7), the middle net worth agents grow at a linear rate (with a growth factor of less than 2, with a starting point that was less than 1e7), while the low income agents’ net worths remain relatively constant (between 10,000 and 20,000). For the centralized, medium-regulation setup, the behavior of the high and medium net worth agents is relatively the same as it was for the low regulation setup (they experience marginally smaller growth rates as they had in the low regulation setups) but now the low income individuals experience a slight linear increase in net worth. There is the approach of convergence between the low and medium income individual net worths. For the centralized high-regulation setup, the high net worth individuals’ growth is limited to a very slight, linear increase, while the middle networth agents experience a linear decrease in net worth. The medium and low income individual net worths fully converge, and the degree of their divergence with respect to the high income individuals is the lowest in this regulatory environment. The low networth agents experience an initially exponential growth in net worth, but plateau to a constant between the 150th to 800th timestep. The three decentralized subplots demonstrate practically identical behavior (in relation to their centralized counterparts), with the minor discrepancy that lower networth individuals experience a larger number of timesteps in the increasing phase of their net worth, before they plateau.

The second plot is of firms net worth over time. All three of the centralized subplots show a generally decreasing pattern, but the trend is steeper and linear for the high regulation setup. For the decentralized low-regulation and medium-regulation setups, the firm net worths are increasing linearly (with a steeper slope in the low regulation environment). Ultimately, for low regulation, the net worth magnitude goes from 5e7 to 4.5e7, compared to the drop from 5e7 to less than 1e7 in the high regulation environment. The decentralized, high regulation trend mirrors that of the high-regulation centralized setup, with a steep, linear, downward trajectory.

The third plot is of regulator net worth over time, where each decentralized subplot mirrors its centralized counterpart. The low regulation setups have the regulator net worth at a constant 1e9, while medium regulation induces very slight linear growth, with a higher growth in the decentralized environment. The high regulation setup shows steep linear growth in regulator net worth (1e9 to 1.4e9), with a steeper slope in the decentralized environment (1e9 to slightly higher than 1.4e9). The implication is that a medium regulation environment is most effective in achieving regulatory stability in the economy.


The behavior of the agents’ net worths in the centralized and decentralized systems, under the three various regulatory conditions defined, is nearly identical. Despite the decentralized environment having lower transaction fees, lower interest rates, a higher volume of transactions occurring, and more opportunities for transactions that include low income agents in particular, there is no major shift or upheaval in the general behavior of the economy when compared to the results of the centralized environment. The rationale for why the behavior ultimately becomes nearly identical is that while the low income agents in the decentralized environment benefit from financial inclusion15 on their own individual level, the advances they make are negligible in the bigger picture. Their increases in net worth would be deemed beneficial when viewed against their previous history, and even within the realm of the overall low wealth subtype. However, in the context of the broader, overall economic system, the impact is not noticeable due to the lower values of transactions involving underrepresented individuals. One implication is that decentralized economic frameworks result in a mutually beneficial outcome, as they democratize financial services without causing any shifts in the broader economy. The decentralized economy facilitates peer to peer lending, which allows for easier and cheaper transactions that do not rely on establishing a bank account, meeting creditworthiness requirements, or paying excessive fees and interest rates. This financial inclusion ultimately results in a higher standard of living for underserved individuals. However, another implication is that lower income individuals may fall directly into a zone where they face higher tax rates (due to increasing net worth) yet do not qualify for government subsidies. The apparently marginal net utility of decentralized economic systems may not be commensurate with the large scale monetary, political, and social investments required to even implement such systems. However, the impact of decentralization on wealth dynamics is multifaceted; as displayed in the centralized, middle regulation, agent net-worth plot, the low income subtype of individuals experienced a slightly steeper linear increase in net worth when compared to the decentralized system under the same regulatory setup. This is perhaps attributed to the offsetting of low net worth individuals’ income increase by the higher tax rates they therefore face. To further examine the discrepancies in the subplots of agent net worth, we look to the slopes of the lines in decentralized medium and high regulation setups, for the regulator income (Figure 3). Both slopes are steeper than those on the graphs of their centralized counterparts. This indicates a net increase in taxable net worth across the board, which may be ascribed to the higher frequency of transactions as well as lower interest rates and fees present in the decentralized environment.

Furthermore, there appear to be no incentives or opportunities for high-wealth individuals to gain any advantage in the decentralized system. Rather, lower-wealth individuals are provided more prospects to be brought up to a financially equal standing with comparison to the other agent subtypes. This may imply the benefit in progressive transaction fees (which are directly correlated with more frequent transactions), and these are more realistically attained in a decentralized setting. In a centralized system, monopolies and firms have the incentive and capability to successfully lobby against such change.

Moving to the implications of the varying regulatory conditions, the first outcome is that high regulations induce a large-scale, negative impact on the income of initially high-net-worth individuals – and these regulations simultaneously benefit the individuals with lower net worths. The net worths of the low and medium income agents eventually converge in the high regulation environment, implying that an underlying goal of that government is to bring all subtypes to the same level. High regulations imply high taxes as well as low transaction fees and interest rates (due to their stark prevention of predatory lending practices). High-net-worth individuals also do not receive bailouts or aid from the government; therefore, they face stagnating levels of growth in high regulation environments. However, in low and medium regulation environments, they experience somewhat exponential monetary growth. Low regulation governments do not provide any fiscal assistance to financially disenfranchised individuals, and they also do not penalize monopolies; such regulatory conditions allow high income agents to thrive while the other groups stagnate. The higher income individuals ultimately benefit more from low regulation environments, while only the lower income individuals benefit from high regulations. Medium regulations reflect a lesser extent of the low regulation model, inducing relatively more net worth convergence between low and medium income individuals. The high regulation environment allows for the most wealth equality, relative to the other regulatory conditions. Regulations influence wealth dynamics by either increasing or decreasing the capital that could be involved in flow between individuals and firms, and they also influence the extent of both individual and firm monopolization. As the level of government intervention varies, so does the amount of subsidization and taxation; this directly affects the net worth of each agent. These characteristics are shared between both centralized and decentralized systems, which further supports the idea that there are no fundamental alterations between the two environments.

A limitation of this study is the fact that our simulation demonstrates a simplified representation of economic systems, so certain details and caveats that have not been included in the simulation but are present in reality may contribute to slightly different outcomes. For example, there are likely other agents involved in actual economies, such as further businesses, organizations, and foreign entities, amongst other such agents (which we collectively abstracted to “third parties”). Another limitation is that the basic structure of our simulation’s economies mirrors an American economy, while certain foreign economies may have different organizations that have not been accounted for. Along with that, the values we use for each hyperparameter are merely reflective – rather than identical – to those present in the actual, current economy. Finally, many interactions and operations were either omitted or simplified (such as individuals investing in firms, government bonds, and purchase of business items or services) in our simulation. We also make the assumption in our simulation that each actor in each economy is a rational agent, and that each agent has the ultimate incentive to increase their net worth, but in reality there would likely be non rational actors that skew the operations and decisions. Furthermore, a more realistic application of a decentralized economic system (considering the investments and agreements required for a large-scale implementation) would likely be a hybrid system that pulls characteristics of both centralized and decentralized systems. It is not likely that there would be a system that is purely decentralized or centralized, but we have implemented those pure systems in our simulation. Ultimately, we evaluated the metrics that were most relevant to the main goal of this study, and abstracted the remaining details.

Further research could introduce the more realistic hybrid setup, with integration of centralized and decentralized characteristics in one type of system. It would also be beneficial to conduct more in-depth analysis on all the opportunities to bring lower income individuals to equal monetary standing with other groups. Another opening for further research is exploring all the possibilities behind why the results ended up looking nearly identical, despite large differences in the inner workings of both setups.


Our economic simulations are modeled as a Markov Decision Process (MDP), a framework used to represent a stochastic environment defined by a set of states, agents, actions, and rewards16. We refer to the comprehensive set of states as a state space and similarly for actions and rewards. In our simulation, we have three types of agents: individuals, firms, and regulators. We select these types of agents because they best encapsulate the most common entities that are present in today’s economies. Individuals make up the majority of participants in an economy, while firms are representative of banks and middlemen. We only define one regulator, which is the government. These types of agents are most involved in the types of interactions we define to occur in both centralized and decentralized economies. Their incentives and actions can also be altered distinctively between the two structures. Furthermore, the individual agents can be further split into three subtypes: low, medium, and high net worth individuals. This subclassification allows for greater behavioral insights. For each of these agents, we generate a distinct state, action, and reward space. The nature of how an agent makes sequential decisions is then governed by their policy ?(a|s), a probability distribution over possible actions at a given time conditioned on s, the state17. Such a policy may be learned or reactive to a reward function, which provides feedback on which actions, a, and states are optimal18. Our simulations are generally stateless (a non-sequential decision setting19), aside from the economic attributes of each agent; the decisions, and ultimately actions, are random – but are weighted by the socioeconomic background of the agent. Transitions between states and certain rewards are stochastic, rather than optimized based on the previous state.

To keep the simulation informative but simple, we define the policy for individuals to be that of maximizing personal wealth by nature of peer-to-peer business in the form of pairwise betting. We impose economic rationality on these simulations capitalistically and in turn define the policy for firms to be setting fees to maximize profit. Government policy involves setting tax rates on individuals based on the tax bracket they fall into, in addition to preventing corporate monopolies by enforcing antitrust regulations that outlaw collusion.

Individual agents may each be classified as one of three possible subtypes based on net worth: low income (or underserved/underrepresented individuals), middle income, or high income individuals. The shared incentive of every agent, at the most basic level, is to increase their net worth. The action space for the individual agents includes three actions: betting a percentage of net worth with other agents, investing in a third party to accrue return on investment, and acquiring loans (either from firms in the centralized setting, or from other individuals in the decentralized setting). Both loan occurrences require a transaction fee to the firm, either for intervening in the transaction (centralized), or for maintaining the blockchain network on which the transaction takes place (decentralized). The loan can then be paid back with interest. Underrepresented agents specifically also have the action to win bets against other agents. The reward space for the individual agents includes winning the bet, earning return on investment from a third party, and receiving government aid based on the income bracket they fall into.

Figure 4| Capital Flow in Centralized v. Decentralized Environment

For firms, the high-level actions (and resulting rewards) they can take are setting fees and interest rates to maximize profits on individuals’ transactions, and carrying out mergers with other companies or spin-offs into multiple companies.

For the regulator (the government), the action space consists of taxing individual agents and firms (resulting in monetary reward), implementing antitrust regulations that limit mergers from taking place based on the extent to which they would eliminate competition, setting limits on the interest rates and transaction fees that firms charge, and also providing financial aid to underrepresented agents and smaller firms.

Simulations were developed in Python, using the NumPy, Matplotlib, Seaborn, Pandas, and Glob libraries. Python was the programming language used to develop the simulation, generate agents, and write the logic of all interactions between agents, and it was also used to save run data into CSV files. Random number generation (for which the NumPy library was utilized) was used to select the following: interacting agents, loan amounts within a calculated range, transaction fees, interest rates, and firms to facilitate transactions. Values such as the merger threshold, tax rate, and individual net worth extremes were calculated using NumPy algebra methods such as mean, maximum, and minimum. The Seaborn library is based on the Matplotlib library, for a higher quality visualization of the net worths of each agent and the variation among the set. Glob, a file collection package, was used alongside the Pandas package for merging the CSV files across all fifty runs and consolidating them into a single dataset.

Table 1| Hyperparameters for Centralized Environment

Each of these hyperparameter values were chosen based on a reflection of reasonable tax rates and large firms’ transaction fees and interest rates strategy. In a low regulation environment, firms may set arbitrary fees and interest rates, only restricted by free-market competition. This would result in higher fees and interest rates, as predatory lending would not be restricted by government intervention. However, in middle and high regulation environments, the government prevents predatory lending practices by imposing limits on the transaction fees and interest rates set by the firms, resulting in lower values for those parameters.

The difference between centralized and decentralized transaction fees and interest rates is that they are multiplied by 0.5 for the decentralized world. We impose this multiplier due to the high likelihood of decentralized environments having lower rates and fees, as those numbers are decided collectively by individuals on the blockchain. Central firms are driven by the incentive to charge higher fees and interest rates since they are solely on the lending side of the transaction (and are therefore the only parties with monetary gain). However, since individuals on a blockchain may be involved in both lending and borrowing, they would likely select rates and fees that encapsulate the interests of both lending and borrowing: enough for profit, while still maintaining a reasonable price for loan seekers. All of these factors ultimately result in the 0.5x multiplier.

The function for calculating the amount of aid given is the percentage multiplied by the mean across total agent net worth (including low, middle, and high net worth agents). A higher amount of aid is given to those in the first quartile, and slightly less aid is given to agents in the second quartile. No government aid is provided beyond this. Because the government is more actively involved in the medium and especially high regulatory environments, the amount of aid given to underrepresented individuals would likely be higher there. The merger threshold is a calculation considering the joint net worth of two firms pre-merger; if that value is greater than the merger threshold, the merger will not take place. The threshold value is lower in a high regulation setup, as the government pursues a policy that prohibits any extent of monopolization. The tax rates on agents and firms are higher in a high-regulation setup, as the government has a stricter hold on the citizens’ and firms’ incomes, as it aims to equalize net worth. The return on investment parameter refers to the interest the agents get back from a third party with which they invest portions of their net worth. For simplicity, we abstract this interaction to be a black-box process, but assume the return on interest to be higher as the level of regulation increases, due to the government’s requirements on a focus on social welfare.

Table 2| Actions, Details, and Rewards


We simulate dynamical economic systems using a Markov Decision Process framework. Individual agents fall under either low income, medium income, or high income classifications, and they can bet, invest, or borrow money. Firm agents are banks that can lend money with transaction fees and interest, or are blockchain network providers that collect transaction fees. The regulator agent is the government, which can collect taxes and provide subsidies to low income individual agents.

In the decentralized setting, transactions happen more frequently than they do in the centralized setting, due to progressive interest rates and transaction fees and decreased barriers of entry. Furthermore, individual agents take on the role of central firms by becoming “lending agents.”

While the decentralized system democratizes financial services by providing more equal-access economic opportunities for underrepresented agents, the difference – compared to the centralized results – is not noticeable in the context of the broader economy. This situation can be viewed both positively and negatively. On one hand, underserved individuals can utilize the increase in financial opportunities to enhance their incomes, but on the other hand, the heavy investment required to switch to a fully decentralized system may not be justified by the ultimate, large-scale results, as they are practically the same in the centralized setting. Further research could account for the simplifications embedded in this simulation, and introduce a wider range of agents and actions while simultaneously implementing a realistic centralized-decentralized hybrid economic system.

  1. N. J. Smelser. A comparative view of exchange systems. Economic Development and Cultural Change. 7, 173-182 (1959). []
  2. R. D. Beer. A dynamical systems perspective on agent-environment interaction. Artificial intelligence. 72(1-2), 173-215 (1995). []
  3. J. Bird-Pollan. Utilitarianism and wealth transfer taxation. Ark. L. Rev. 69, 695 (2016). []
  4. Z. D. Bailey, J. M. Feldman, M. T. Bassett. How structural racism works—racist policies as a root cause of US racial health inequities. New England Journal of Medicine. 384, 768-773 (2021). []
  5. J. E. Roemer, A. Trannoy. Equality of opportunity. Handbook of Income Distribution. Elsevier. 2, 217-300 (2015). []
  6. K. Qin, L. Zhou, Y. Afonin, L. Lazzaretti, A. Gervais. CeFi vs. DeFi–comparing centralized to decentralized finance. arXiv preprint arXiv:2106.08157. (2021). []
  7. V. K. Manda, S. Yamijala. Peer-to-peer lending using blockchain. International Journal Of Advance Research And Innovative Ideas In Education. 6, 61-66 (2019). []
  8. J. Y. Lee. A decentralized token economy: How blockchain and cryptocurrency can revolutionize business. Business Horizons. 62, 773-784 (2019). []
  9. V. Gramlich, T. Guggenberger, M. Principato, B. Schellinger, N. Urbach. A multivocal literature review of decentralized finance: Current knowledge and future research avenues. Electronic Markets. 33, 11 (2023). []
  10. G. H. Orcutt. Simulation of economic systems. The American Economic Review. 50, 894-907 (1960). []
  11. T. Zhang, W. J. Nuttall. Evaluating government’s policies on promoting smart metering diffusion in retail electricity markets via agent?based simulation. Journal of Product Innovation Management. 28, 169-186 (2011). []
  12. B. Dushimimana, Y. Wambui, T. Lubega, P. E. McSharry. Use of machine learning techniques to create a credit score model for airtime loans. Journal of Risk and Financial Management. 13, 180 (2020). []
  13. P. Aghion, M. Schankerman. On the welfare effects and political economy of competition?enhancing policies. The Economic Journal. 114, 800-824 (2004). []
  14. H. Nabilou. How to regulate bitcoin? Decentralized regulation for a decentralized cryptocurrency. International Journal of Law and Information Technology. 27, 266-291 (2019). []
  15. P. K. Ozili. Decentralized finance research and developments around the world. Journal of Banking and Financial Technology. 6, 117-133 (2022). []
  16. J. Rust. Structural estimation of Markov decision processes. Handbook of Econometrics. 4, 3081-3143 (1994). []
  17. G. Thomas. Markov decision processes. https://ai.stanford.edu/~gwthomas/notes/mdps.pdf (2007). []
  18. E. Even-Dar, S. M. Kakade, Y. Mansour. Online Markov decision processes. Mathematics of Operations Research. 34, 726-736 (2009). []
  19. S. Choudhury, J. K. Gupta, P. Morales, M. J. Kochenderfer. Scalable online planning for multi-agent MDPs. Journal of Artificial Intelligence Research. 73, 821-846 (2022). []


Please enter your comment!
Please enter your name here