Machine Learning Stock Market Simulation and Trading Bot Implementation


Derek Chen1, Jean-Claude Franchitti2,*

1. Belmont High School, Belmont, MA 02478

2. Courant Institute of Mathematical Sciences, New York University, New York, NY 10003

* Corresponding author

Keywords: Machine Learning, Reinforcement Learning, Long-Short Term Memory, Trading Bots, Financial Modeling


In this paper and associated research project, we seek to gain a better understanding of meaningful stock market interactions through employing trading bots to simulate trading scenarios leveraging generated orders in a cloud environment. The stock market is a lucrative investment opportunity, however due to the inherent volatility of macroeconomic systems, investments in the stock market are generally risky. The complexity and risk involved is daunting to many first-time traders who want to begin buying and selling in the market. This paper and associated research project seeks to help first-time traders gain experience of trading strategies in a simulated market environment. 

Many models have been employed to help understand the stock market, especially with the recent progress in machine and deep learning. Long-Short Term Memory (LSTM) and Generative Adversial Networks (GANs) are two categories of deep learning models used to simulate and predict the stock market. Most simulation models are focused on long-term stock price predictions. However, in this research paper, we experiment with an LSTM model that generates and models stock order time series in real time. Reinforcement learning algorithms such as Deep Deterministic Policy Gradient (DDPG), Q-Learning, and Proximal Policy Optimization (PPO2) are often used to predict and analyze data from the stock market. In this paper and associated research project, we experiment with trading bots using these models to determine which one generates the most revenue by evaluating corresponding profit and loss indices. These models can be used to better understand the market for first-time traders.


The stock market is profitable yet inherently risky and volatile so unlocking the patterns of the stock market is of key interest to economists, market analysts, investors, and more. However, analyzing stock market data poses several difficulties. Stock market data is high-volume, multidimensional, and vulnerable to fluctuations and noise of the market, thus obscuring many patterns from a rudimentary point of view. 

High throughput and performance computing on modern computer architectures facilitate the handling and elicitation of patterns in large quantities of data and machine learning algorithms prove to be practical and effective to help study market data. Deep learning and neural network algorithms have shown promise for studying complex financial time series. Features can be extracted through Convolutional Neural Networks (CNNs) and modeling can be implemented through algorithms such as LSTM and GANs.

In this paper and associated project, we experiment with a generative agent and a trading bot algorithm using to submit orders in a simulated trading environment. First, we parse seed data from sample stock market orders and feed it in to a generative agent in order to generate more sample market data by extracting patterns from the original seed data financial time series. Generated sample market data is then fed into a machine learning trading bot. Next, we use a message broker (e.g. RabbitMQ [1]) to implement an order book and related order processing modules that are used to manage submitted buy and sell orders. Within this environment, we deploy trading bots that can place orders and use an OpenAI Gym [2] interface to manage balances and validate orders. We then evaluate the accuracy and efficiency of different machine learning algorithms, including PPO2 and Twin Delayed Deep Deterministic Policy Gradient (TD3), to determine the most effective machine learning algorithm for trading bots within a given trading environment. All in all, we are able to use these simulated models to help us better understand the securities market.

Related work and background

Machine learning is often applied to both the field of economics and the global market trading to help predict trends and make prudent economic decisions [3]. One particular area of interest is the modelling of stock prices. Stock price models can predict the future of the stock market and be used to generate sample data. Several different machine learning and deep learning algorithms have been employed in this manner. Recurrent neural networks have been used to predict stock returns [4] however they are not suited to deal with time gap variability [5]. Long-Short Term Memory (LSTM) algorithms transfer both cell and hidden states between cells. They are frequently used to predict stock prices because of their ability to handle complex time series and record past states [6], as well as process long time lags between relevant signals [5]. Generative Adversarial Networks (GANs) have also been widely applied due to their ability to preserve certain statistical properties of the financial data [7], [8], [9]. Other models include Autoregressive Integrated Moving Average (ARIMA) models [10], [11]. The combination of two different algorithmic models such as Convolutional Neural Networks (CNNs) for their feature extraction capability and LSTM for time series handling has also proved fruitful [12].

Portfolio and securities management is also an area of interest and has prompted the application of many algorithms, including LSTM and CNNs [13]. Stock market trading involves reward and loss and is suited to decision-making models such as reinforcement learning agents that utilize these concepts to maximize reward over time. The many variants of reinforcement learning, including deep learning, have been applied fruitfully to algorithmic trading and security portfolio management [14], [15].

While most deep learning models that have been applied to the stock market use closing price data or sentiment analysis [17] to model long-term behavior of the stock market to investigate structural patterns and instabilities, we are more interested in aiding first-time traders. Such long-term closing price models are generally not applicable to traders who need to make decisions on short notice throughout the day. A delay in seconds between placing orders can be consequential, so we experiment with a generative agent and limit order book that can simulate real-time transactions. Our approach may not only be of service to first-time traders but it may also be used to study related scenarios such as fraud detection.

Different from the existing literature that focuses on evaluating closing prices, we are evaluating stock trading in a semi-real-time continuous dual auction system. In our environment, traders submit limit orders consisting of the desired quantity of stock and either the maximum bid price they would be willing to buy at or the minimum ask price they would be willing to sell for. The orders are then submitted to a limit order book which maintains a record of active orders that have not been traded or deleted. When the requirements between an active buy order and an active sell order are satisfied, the two orders are matched with each other. Sometimes, the order quantities may not be matched perfectly, in which case the two orders are partially transacted using mainstream algorithm [9]. Because sell orders have a minimum ask price while buy orders have a maximum bid price, the lowest sell order and the highest buy orders are the first to be transacted. We term the lowest sell order the best ask and the highest buy order the best bid. Now, we introduce the setup.

Solution setup and implementation

To setup the market environment simulation on the cloud, we have three main parts to the platform. First, we implement an order book to keep track of and match incoming orders. The second part consists of a generative agent to generate market orders based on financial time series data. Finally, we use trading bots to place and delete buy and sell orders over specific securities to maximize profit. After a certain number of trades, we measure the efficacy of the trading bots through their profit and loss indices.

We run this simulation on the cloud for its scalability and parallel processing capacity. Our experiments are performed on a Standard NC6 Linux Virtual Machine on the Microsoft Azure cloud. The machine has one GPU which allows for parallel processing on computation heavy algorithms such as LSTM.

First, we set up a limit order book infrastructure in order to connect the generative agent and the trading bot implemented through a message broker such as RabbitMQ. This allows us to simulate real-time dynamic order books in actual trading scenarios. We utilize direct exchanges to filter the orders through their specific tickers and to separately manage and direct incoming buy and sell orders. Trading bots can subscribe to only those securities that they are interested in. Message brokers such as RabbitMQ enable synchronous and dynamic message reception without data loss.

Message brokers such as RabbitMQ also allow for the messages to be validated through a unique ID so orders can be confirmed are less susceptible to misuse. Internal and external IDs allow both the order book and the generative agent/trading bot to accept and synchronize orders. The order book manages the incoming AON (all-or-nothing) buy and sell orders from the generative agent and the trading bots. If a buy order can be matched with a sell order, the trade is made. The limit order book publishes the price levels of all the securities and also publishes any trades it makes. Figure 1 displays a diagram of the architecture and connections.

Figure 1 | Structure of the Market Simulation

We use a multitude of algorithms and processes to generate the data to simulate the market environment. We need a continuous generative agent for data so we implement a Long-Short Term Memory (LSTM) generative agent. The LSTM generative agent tensorizes and normalizes the data and then splits the data into batches, allowing for parallelism in training or testing. We experiment with a bilayer LSTM through PyTorch [16]. During training, the model generates orders to reconstruct the sample data time series. After generating an order, the model evaluates it with the succeeding real data value in the data set through the loss function which is defined as the weighted sum of the option, interval, price, and size losses.

We input some seed orders to the generative agent from a replay agent to start. After the generative agent has received sufficient preliminary seed data, it starts generating orders and publishing them to the limit order book. We experimented with changing the relative weights of the losses as well as the learning rate and weight decay to obtain the optimal model.

For the trading bots, we experiment with PPO2 and TD3 models that are pretrained on a separate data set to run in the OpenAI Gym environment. This separate data set has orders for all the tickers to simulate the complete market but the trading bot can be run to be focused on a specific set of tickers. We finally connect the trading bot, the generative agent, and the limit order book using a RabbitMQ interface and test run all the different models to compare their performances.

The trading bot uses reinforcement learning to make the best trades. Here, we experimented with two different algorithms: Proximal Policy Optimization (PPO2) and Twin Delayed Deep Deterministic Policy Gradient (TD3). After each decision, the cost of the decision is evaluated through gradient descent to find the optimal strategy.

To evaluate the trading bots, we use a profit and loss index. The trading bot starts off with a certain balance. Every time the trading bot is successfully matched for a trade, the security is appended to or removed from the portfolio and the balance is updated, but never allowed to drop into the negatives. The profit and loss index is the difference between the sum of the balance and the value of all the securities in the portfolio with the starting balance, divided by the starting balance. The profit and loss indices after every trade is recorded and graphed at the end.


First, we examine the results of the generative agent. Figure 2 shows the loss functions of the generative agent during training. The graphs in order from left to right, top to bottom, are combined loss, option loss, interval loss, price loss, and size loss.

Figure 2 | The loss functions of the LSTM generative agent with a green line representing the exponential moving average for the first graph.
Figure 3 | Generated data for two chosen securities: the blue dots represent the prices of sell orders and the yellow dots represent the prices of buy orders.

It is seen that the loss decreases rapidly after the first few predictions for most losses. Most of the loss is contributed by the price loss which does not decrease as fast as the other losses.

We also examine the generated data as displayed in Figure 3. The generated data from figure 3 closely resembles the seed market data and can be used to effectively simulate dynamic trading scenarios in conjunction with the trading bot.

Regarding the reinforcement learning trading bot, the top graphs of figure 4 shows the profit and loss indices of the trading bot through training on a sample data set for both the PPO2 and TD3 models. The bottom graphs of Figure 4 display the profit and loss indices of the corresponding trading bots in simulation with the LSTM generative agent.

Figure 4 | Profit and loss indices of PPO2 (left) and TD3 (right) models. The top graphs display the profit and loss indices of the model during training throughout the epochs. The bottom graphs display the profit and loss indices of the model in the market simulation in combination with the limit order book and generative agent.

Through the profit and loss indices of these trading bots, PPO2 has a better performance compared to the TD3 with a maximum profit and loss index of 1.337e-2 for the PPO2 model compared to the profit and loss index of 4.972e-3 for the TD3 model. From the bottom graphs of the profit and loss indices in figure 4 which represent the trading bots in simulation, we see that the trading bots prove both profitable and advantageous in simulated dynamic trading scenarios. The choppier nature of the training graphs can be attributed to the fact that the trading bots are still in the learning process and thus make more mistakes that may result in losses.

One limitation of this proposed solution is that the strategy is built into the trading bot upon training. It is not able to adapt to changing market conditions as well as evolutionary computation models such as NEAT.

Conclusion and Future Work

In this paper, we develop a generative agent and a trading bot to simulate a trading scenario using multiple machine learning algorithms including Long-Short Term Memory and Reinforcement Learning with a RabbitMQ message broker infrastructure to facilitate order submission and transactions. Unlike earlier literature which most often focuses on predicting stock market closing prices, our simulation is dynamic and simulates real-time market interactions. The generated orders matched the market data well and the trading bots were able to generate profits. We set up this simulation on the cloud for maximum future scalability and parallelism.

There are many possible avenues of expansion for this research project. First, we would like to investigate how genome-inspired algorithms such as NEAT (NeuroEvolution of Augmenting Topologies) compare to reinforcement learning algorithms. NEAT is a neural network modeled off genomics and evolutionary computation. Trading bots and strategies must constantly adapt to the ever-fluctuating market place so we would like to investigate the uses of an evolutionary model such as NEAT in studying market data. 

Another future direction is the implementation of online training. Currently, the algorithm only works with pretrained offline models using set data sets. We’d like to construct the infrastructure to allow the trading bots to access real-time financial data and to engage in online training and trading. We would also like to incorporate slippage and other market frictions into our simulation.

Finally, an interesting future project is the integration of short-term stock market models based on trade data and long-term stock market models based on closing prices to attempt to capture the complexity of the stock market. A promising direction is the combined integration of sentiment analysis through the study of text corpi and messages as in [17].


The author would like to acknowledge his mentor Professor Jean-Claude Franchitti for his thoughtful and invaluable instruction and guidance and Joanna Gilberti for her insightful feedback and advice. The authors would also like to thank Citigroup Inc. [18] for providing the data.


[1] (accessed September 11th, 2020).
[2] (accessed September 11th, 2020).
[3] Krollner, Bjoern, Bruce J. Vanstone, and Gavin R. Finnie. “Financial time series forecasting with machine learning techniques: a survey.” Esann. 2010.
[4] Rather, Akhter Mohiuddin, Arun Agarwal, and V. N. Sastry. “Recurrent neural network and a hybrid model for prediction of stock returns.” Expert Systems with Applications 42.6 (2015): 3234-3241.
[5] Hochreiter, Sepp, and Jürgen Schmidhuber. “LSTM can solve hard long time lag problems.” Advances in neural information processing systems. 1997. [6] Hwang, Jungsik. “Modeling Financial Time Series using LSTM with Trainable Initial Hidden States.” arXiv preprint arXiv:2007.06848 (2020).
[7] Tovar, Wilfredo. “Deep Learning Based on Generative Adversarial and Convolutional Neural Networks for Financial Time Series Predictions.” arXiv preprint arXiv:2008.08041 (2020).
[8] Takahashi, Shuntaro, Yu Chen, and Kumiko Tanaka-Ishii. “Modeling Financial Time-Series with Generative Adversarial Networks.” Physica A: Statistical Mechanics and its Applications 527 (2019): 121261.
[9] Li, Junyi, et al. “Generating realistic stock market order streams.(2019).” URL https://openreview. net/forum. 2019.
[10] Doshi, Akash, et al. “Deep Stock Predictions.” arXiv preprint arXiv:2006.04992 (2020).
[11] Ariyo, Adebiyi A., Adewumi O. Adewumi, and Charles K. Ayo. “Stock price prediction using the ARIMA model.” 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation. IEEE, 2014.
[12] Tsantekidis, Avraam, et al. “Using deep learning for price prediction by exploiting stationary limit order book features.” Applied Soft Computing (2020): 106401.
[13] Sangadiev, Aiusha, et al. “DeepFolio: Convolutional Neural Networks for Portfolios with Limit Order Book Data.” arXiv preprint arXiv:2008.12152 (2020).
[14] Théate, Thibaut, and Damien Ernst. “An application of deep reinforcement learning to algorithmic trading.” arXiv preprint arXiv:2004.06627 (2020).
[15] Lee, Jinho, et al. “MAPS: Multi-agent Reinforcement Learning-based Portfolio Management System.” arXiv preprint arXiv:2007.05402 (2020).
[16] (accessed September 11th, 2020).
[17] Zhou, Yue, and Kerstin Voigt. “Stock Index Prediction with Multi-task Learning and Word Polarity Over Time.” arXiv preprint arXiv:2008.07605 (2020).
[18] (accessed September 11th, 2020)