### : Amazon stock price prediction 2019

WHOLE FOODS CAMBRIDGE STREET BOSTON |

SUPER JUNIOR M SWING ALBUM DOWNLOAD |

Best high yield savings rates |

CORNERSTONE FINANCIAL CREDIT UNION FRANKLIN TN |

### Amazon stock price prediction 2019 -

## Indian stock market prediction using artificial neural networks on tick data

*Financial Innovation***volume 5**, Article number: 16 (2019) Cite this article

57k Accesses

1 Altmetric

Metrics details

### Abstract

### Introduction

Nowadays, the most significant challenges in the stock market is to predict the stock prices. The stock price data represents a financial time series data which becomes more difficult to predict due to its characteristics and dynamic nature.

### Case description

Support Vector Machines (SVM) and Artificial Neural Networks (ANN) are widely used for prediction of stock prices and its movements. Every algorithm has its way of learning patterns and then predicting. Artificial Neural Network (ANN) is a popular method which also incorporate technical analysis for making predictions in financial markets.

### Discussion and evaluation

Most common techniques used in the forecasting of financial time series are Support Vector Machine (SVM), Support Vector Regression (SVR) and Back Propagation Neural Network (BPNN). In this article, we use neural networks based on three different learning algorithms, i.e., Levenberg-Marquardt, Scaled Conjugate Gradient and Bayesian Regularization for stock market prediction based on tick data as well as 15-min data of an Indian company and their results compared.

### Conclusion

All three algorithms provide an accuracy of 99.9% using tick data. The accuracy over 15-min dataset drops to 96.2%, 97.0% and 98.9% for LM, SCG and Bayesian Regularization respectively which is significantly poor in comparison with that of results obtained using tick data.

### Introduction

A stock market is a platform for trading of a company’s stocks and derivatives at an agreed price. Supply and demand of shares drive the stock market. In any country stock market is one of the most emerging sectors. Nowadays, many people are indirectly or directly related to this sector. Therefore, it becomes essential to know about market trends. Thus, with the development of the stock market, people are interested in forecasting stock price. But, due to dynamic nature and liable to quick changes in stock price, prediction of the stock price becomes a challenging task. Stock markets are mostly a non-parametric, non-linear, noisy and deterministic chaotic system (Ahangar et al. 2010).

As the technology is increasing, stock traders are moving towards to use Intelligent Trading Systems rather than fundamental analysis for predicting prices of stocks, which helps them to take immediate investment decisions. One of the main aims of a trader is to predict the stock price such that he can sell it before its value decline, or buy the stock before the price rises. The efficient market hypothesis states that it is not possible to predict stock prices and that stock behaves in the random walk. It seems to be very difficult to replace the professionalism of an experienced trader for predicting the stock price. But because of the availability of a remarkable amount of data and technological advancements we can now formulate an appropriate algorithm for prediction whose results can increase the profits for traders or investment firms. Thus, the accuracy of an algorithm is directly proportional to gains made by using the algorithm.

### Case description

There are three conventional approaches for stock price prediction: technical analysis, traditional time series forecasting, and machine learning method. Earlier classical regression methods such as linear regression, polynomial regression, etc. were used to predict stock trends. Also, traditional statistical models which include exponential smoothing, moving average, and ARIMA makes their prediction linearly. Nowadays, Support Vector Machines (Cortes & Vapnik, 1995) (SVM) and Artificial Neural Networks (ANN) are widely used for the prediction of stock price movements. Every algorithm has its way of learning patterns and then predicting. Artificial Neural Network (ANN) is a popular and more recent method which also incorporate technical analysis for making predictions in financial markets. ANN includes a set of threshold functions. These functions trained on historical data after connecting each other with adaptive weights and they are used to make future predictions. (Trippi & Turban, 1992; Walczak, 2001; Shadbolt & Taylor, 2002) (Kuan & Liu, 1995) investigated the out-of-sample forecasting ability of recurrent and feedforward neural networks based on empirical foreign exchange rate data (Kuan & Liu, 1995). In 2017, Mehdi Khashei and Zahra Haji Rahimi evaluated the performance of series and parallel strategies to determine a more accurate one using ARIMA and MLP (Multilayer Perceptron) (Mehdi & Zahra, 2017).

Artificial neural networks have been used widely to solve many problems due to its versatile nature. (Samek & Varachha, 2013) (Yodele et al., 2012), presented a hybridized approach, i.e., a combination of the variables of fundamental and technical analysis of stock market indicators to predict future stock prices to improve the existing methods, (Yodele et al., 2012) (Y Kara & A Boyacioglu, 2011) discussed stock price index movement using two models based on Artificial Neural Network (ANN) and Support Vector Machine (SVM). They compared the performances of both the models and concluded that the average performance of the ANN model was significantly better than the SVM model. (Y Kara & A Boyacioglu, 2011) (Qi & Zhang, 2008) investigated the best modeling of trend time series using Neural Network. They used four different approaches, i.e., raw data, raw data with a time index, de-trending and differencing for modeling various trend patterns and concluded Neural Network gives better results (Qi & Zhang, 2008). H.K. Cigizoglu, (2003) discussed the application of ANN forecasting, estimation and extrapolation of the daily flow data belonging to the rivers in the East Mediterranean region of Turkey. In their study, they found that ANN provides a better fit to the data than conventional methods (Cigizoglu, 2003). ANN can consider as a computation or a mathematical model which is inspired by the functional or structural characteristics of biological neural networks. These neural networks are developed in such a way that it can extract patterns from noisy data. ANN first train a system using a large sample of data known as training phase then it introduces the network to the data which was not included in the training phase, this phase known as validation or prediction phase. The sole motive of this procedure is to predict new outcomes. (Bishop, 1995) This idea of learning from training and then predicting outcomes in ANN comes from the human brain which can learn and respond. Thus ANN has been used in many applications and is proven successful in executing complex functions in a variety of fields (Fausett, 1994).

The data used in this case study is tick data of Reliance Private Limited from period 30 NOV 2017 to 11 JAN 2018 (excluding holidays). There are roughly 15,000 data points per day. The dataset used contains approximately 430,000 data points. The data obtained from Thomson Reuter Eikon database^{Footnote 1} (This database is a paid product of Thomson Reuter). Each tick refers to the change in the price of the stock from trade to trade. The stock price at the start of every 15 min extracted from the tick data. This represents the secondary dataset on which same algorithms have run. In this study, we have made predictions on Tick Data, and 15-min Data using the same neural networks and their results are compared.

### Discussion and evaluation

In this study, we have used variations of ANN to predict the stock price. But the efficiency of forecasting by ANNs depends upon the learning algorithm used to train the ANN. This paper compares three algorithms, i.e., Levenberg-Marquardt (LM), Scaled Conjugate Gradient and Bayesian Regularization. As shown in Fig. 1, neural networks with 20 hidden layers and a delay of 50 data points are used. Thus, each prediction is made using the last 50 values.

### Theory of Levenberg-Marquardt

The Levenberg-Marquardt algorithm was developed to approximate the second-order training speed to avoid the computation of the Hessian matrix, and used for solving a non-linear least square problem. The Hessian matrix can be estimated if the performance function is in the form of a sum of squares by

$$ \boldsymbol{H}={\boldsymbol{J}}^{\boldsymbol{T}}\boldsymbol{J} $$

(1)

Equation **(****1****)** is used to avoid heavy computation of hessian matrix as it can be calculated using Jacobian matrix.

The gradient is calculated in **(2)**, which is first order derivative of total error function and used for updating weights in **(4)**

$$ \boldsymbol{g}={\boldsymbol{J}}^{\boldsymbol{T}}\boldsymbol{e} $$

(2)

where ** J** is the Jacobian matrix and

**e**is a vector of network errors. All the first derivatives which correspond to the network errors with respect to biases and weights contained in

**. Keeping in mind the end goal to ensure that the approximated Hessian matric**

*J*

*J*^{T}

**is invertible, Levenberg– Marquardt calculation acquaints another approximation of Hessian matrix:**

*J*$$ \boldsymbol{H}={\boldsymbol{J}}^{\boldsymbol{T}}\boldsymbol{J}+\boldsymbol{\mu} \boldsymbol{I} $$

(3)

where ** μ** is scaler and

**is identity matrix. By combining**

*I***(2)**and

**(3)**update rule for the Levenberg-Marquardt algorithm is as the Newton-like update:

$$ {\boldsymbol{x}}_{\boldsymbol{k}+\mathbf{1}}={\boldsymbol{x}}_{\boldsymbol{k}}-{\left[{\boldsymbol{J}}^{\boldsymbol{T}}\boldsymbol{J}+\boldsymbol{\mu} \boldsymbol{I}\right]}^{-\mathbf{1}}{\boldsymbol{J}}^{\boldsymbol{T}}\boldsymbol{e} $$

(4)

If the value of the scalar ** μ** is zero, this algorithm will be similar to Newton’s method which uses Hessian matrix approximation. If the scalar

**becomes large, this algorithm will be similar to gradient descent with small step size. But, Newton’s method is much closer and faster near an error minimum. So, the primary objective is to shift toward Newton’s method as fast as possible. Thus, decreasing**

*μ***after each successful step leads to trimming of the performance function.**

*μ***will increase only when there is any improvement in performance function at any tentative step as shown in Fig. 2. Therefore, at each iteration, the performance function is always reduced.**

*μ*One of the significant merits of the LM approach is that it performs similarly to gradient search and Newton method for large values of ** μ** and small values of

**respectively. The LM algorithm merges the best attributes of the steepest-descent algorithm and the Gauss-Newton technique. Also, many of their limitations avoided. Specifically, this algorithm handles the problem of slow convergence efficiently (Hagan & Menhaj, 1994).**

*μ*### Theory of scaled conjugate gradient

In the backpropagation algorithm, the weights are adjusted in the steepest descent direction (negative of the gradient) because the performance function decreases rapidly in this direction. But, the rapid reduction of performance function in this direction does not imply the fastest convergence always. The search is done along the conjugate directions in the conjugate gradient algorithms thus, generally producing speedier convergence than the steepest-descent direction. To find the length of the updated weight step size, most of the algorithms use a learning rate. But, the step size is modified in each iteration in most of the conjugate gradient algorithms. Therefore, to reduce the performance function, the search is done along the conjugate gradient direction to find the step size.

A key advantage of the scaled conjugate gradient is that it does not line search at each iteration as compared to all other conjugate gradient algorithms. In the line search, the network responses of all training inputs are computed some times for every search which is computationally expensive. Thus, to avoid time-consuming line searches the scaled conjugate gradient algorithm (SCG) was designed by Moller, (1993). The Scaled Conjugate Gradient (SCG) algorithm is a supervised algorithm which is fully-automated. It does not include any critical user-dependent parameters, and it is much faster than the Levenberg-Marquardt Backpropagation. We can use this algorithm on any dataset, if net input, weight and transfer functions of the given dataset have a derivative function. Derivatives of performance concerning bias variable **X** and weight are calculated using backpropagation. So, it avoids line search at every iteration to approximate scale step size by using the LM algorithm (Hagan et al., 1996).

Training phase stops when any of the following conditions appear:

If the maximum number of repetitions achieved.

If maximum time is overshot.

The performance reduced to the target.

If the gradient of the performance is lower than the minimum gradient.

If the validation performance has crossed the maximum fail times since the last time it decreased (when using validation).

### Theory of Bayesian regularization

Bayesian regularized artificial neural networks (BRANNs) eliminate or reduce the requirement for lengthy cross-validation. Hence perform more robustly than standard backpropagation. Bayesian regularization is a mathematical process that converts a nonlinear regression into a “well-posed” statistical problem in the manner of ridge regression. A key advantage of this algorithm is that it considers the probabilistic nature of the weights in the network related to the given data set. The probability of overfitting increases dramatically as more hidden layer of neurons added in the neural network. Thus for a stopping point, it requires a validation set. In this algorithm all unreasonable complex models penalized by pushing extra linkage weights to zero. The network will train and calculate the non-trivial weights. As the network grows some parameters will converge to a constant. Also, the volatility and noise in stock markets lead to the probability of overtraining for basic backpropagation networks. But, Bayesian networks are more parsimonious and tend to reduce the probability of overfitting and eliminate the need for a validation step. Therefore, the available data for training is increased (Jonathon, 2013).

Bayesian regularization has the same usage criteria as the Scale Conjugate Gradient Backpropagation algorithm. This algorithm minimizes the weights and a linear combination of squared errors. For good generalization qualities of the network, this algorithm modifies the linear combinations (Guresen et al., 2011; Hagan & Menhaj, 1999). This Bayesian regularization takes place within the Levenberg-Marquardt algorithm.

### Results

### Performance plots

The performance plots help us to identify the number of iterations (epochs) at which the mean squared error become least or stops changing. The number of iterations does not represent time as we can see that Scaled Conjugate Gradient gives the best validation in 103 (54) iterations and Levenberg-Marquardt gives in 10 (13) iterations on tick dataset (15-min dataset) but the time taken by Scaled Conjugate Gradient is less than Levenberg-Marquardt in both datasets. From Fig. 3. We see that Bayesian Regularization is giving least mean squared error compared to Levenberg-Marquardt, followed by Scaled Conjugate Gradient when overall performance over all datasets. But, when only the performance on test dataset is compared, the Scaled Conjugate Gradient gives the best performance.

For all three algorithms, the same dataset is used. The training is done on 60% of the dataset, 15% of the dataset is used for validations, and the rest 25% of the dataset is used for testing (Since 25% of dataset is used for testing, value of K in K-fold validation is 4). Since the Bayesian Regularized Artificial Neural Network uses both the training and validation parts for training, it uses a total of 75% of the dataset for training. The testing dataset is chosen at random from the dataset.

### Regression plots

The network performance is validated through regression plots. Thus, the network output regarding targets for training, validation, testing, and overall datasets are displayed by the regression plots. The Bayesian Regularization uses the whole validation dataset for training as well. From Fig. 4. We can see that the fit is very good for all tick data sets as the R values in each case of 0.99 or above, but the accuracy drops when predicting over the 15-min dataset. Only Bayesian Regularization gives R-value of almost 0.99 on the 15-min dataset. The accuracy of SCG and LM drop to 0.97 and 0.96 respectively over the 15-min dataset. Here also Bayesian Regularization is outperforming LM and SCG incomplete dataset. But, when only the regression plots on test dataset are compared, the Scaled Conjugate Gradient gives best results. These plots portray that prediction over tick dataset gives better predictions than prediction over the 15-min dataset.

In Table 1 prediction accuracy using general validation and K-fold validation have been compared using mean square error (MSE) metric. We can see that there is not significant change in accuracy for all the algorithms. Similar comparison has been done in Table 2 using 15-min tick data. In Tables 3 and 4 general validation has been used in order to compare results using MSE and mean absolute percentage error (MAPE) metrics. In both the table prediction accuracy using MAPE and MSE indicators has been compared. From Tables 1, 2, 3 and 4, we can say that prediction using tick data gives us better accuracy rather than using 15-min tick data.

### Error histograms

In Fig. 5, red bars represent the testing data, green bars represent validation data, and blue bars represent training data. The error range (maximum negative error to maximum positive error) is divided into 20 bins, and the plots are drawn. The outliers can be identified in the histogram. Thus, the data points where the fit is notably worse than the majority of data termed as outliers. In this case, we can see that all three methods give better results on tick data as compared to the 15-min data. The error on tick data is majorly in the smallest bin over null error whereas the error in 15-min data is distributed over a few bins. From the error histograms, it’s visible that Bayesian Regularization outperforms both Scaled Conjugate Gradient and Levenberg-Marquardt regarding accuracy over both the datasets.

### Conclusion and future work

This study compares the performance of three Neural Network learning algorithms, i.e., Levenberg-Marquardt, Scaled Conjugate Gradient and Bayesian Regularization by predicting over tick by tick dataset and 15-min dataset. The study shows that prediction using tick by tick data for the stock market gives much better results than prediction using 15-min dataset. The first algorithm is based on Levenberg-Marquardt optimization which uses an approximation to Hessian matrix to approach second-order training speed. This gives excellent results and takes a few hours to train. The second algorithm Scaled Conjugate Gradient (SCG), based on conjugate directions uses a step size scaling mechanism to avoid time-consuming line search per learning iteration, which makes this algorithm much faster than the second order algorithms like Levenberg-Marquardt. Training using SCG takes a few minutes which is a significant improvement over Levenberg-Marquardt, but the error also increases in the tick data prediction. The third algorithm Bayesian Regularization takes a few days to train over a large dataset but gives better results than both Levenberg-Marquardt and SCG. All three algorithms provide an accuracy of 99.9% using tick data. The accuracy over 15-min test dataset changes completely. SCG takes least time and gives best results compared to Levenberg-Marquardt and Bayesian Regularization. But the resut obtained on 15 min dataset is significantly poor in comparison with that of results obtained using tick data.

In this case study, the data of past 30 business days used. A more extensive dataset can be used to bring in seasonal and annual factors that affect the stock price movement. Also predicting the minute by minute data can reduce dataset size by 70% and may be able to give comparable results while allowing us to use historical data of a more significant period. Recurrent Neural Networks may provide better predictions than the neural networks used in this study, e.g., LSTM (Long Short-Term Memory). Since statements and opinions of renowned personalities are known to affect stock prices, some Sentiment Analysis can help in getting an extra edge in stock price prediction.

### Abbreviations

Artificial Neural Network

Autoregressive Integrated Moving Average

Back Propagation Neural Network

Bayesian Regularization

Bayesian Regularized Artificial Neural Network

January

Levenberg Marquardt

Long Short-Term Memory

Mean Absolute Percentage Error

Multilayer Perceptron

Mean Square Error

November

Quantitative Structure-Activity Relationship

Scaled Conjugate Gradient

Support Vector Machine

Support Vector Regression

### References

Ahangar RG, Yahyazadehfar M, Pournaghshband H (2010) The comparison of methods artificial neural network with linear regression using specific variables for prediction stock Price in Tehran stock exchange. Int J Comp Sci Informat Sec 7(2):38–46

Google Scholar

Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford, UK

Google Scholar

Cigizoglu HK (2003) Estimation, forecasting and extrapolation of river flows by artificial neural networks. Hydrol Sci J 48(3):349–361

Article Google Scholar

Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297

Google Scholar

Fausett L (1994) Fundamentals of neural networks. Prentice Hall, New York, NY, USA

Google Scholar

Guresen E, Kayakutlu G, Daim TU (2011) Using artificial neural network models in stock market index prediction. Expert Syst Appl 38:10389–10397

Article Google Scholar

Hagan MT, Demuth HB, Beale MH (1996) Neural Network Design. PWS Publishing, Boston, MA

Google Scholar

Hagan MT, Menhaj M (1999) Training feedforward networks with the Marquardt algorithm. IEEE Trans Neural Netw 5(6):989–993

Article Google Scholar

Hagan MT, Menhaj MB (1994) Training feedforward networks with the Marquardt algorithm. IEEE Trans Neural Netw 5(6):989–993

Article Google Scholar

Jonathon TL (2013) A Bayesian regularized artificial neural network for stock market forecasting. Experts System with Application 40:5501–5506

Article Google Scholar

Kuan CM, Liu T (1995) Forecasting exchange rates using feedforward and recurrent neural networks. J Appl Econ 10(4):347–364

Article Google Scholar

Mehdi K, Zahra H (2017) Performance evaluation of series and parallel strategies for financial time series forecasting. Financial Innovation 3:24

Article Google Scholar

Moller MF (1993) A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw 6:525–533

Article Google Scholar

Qi M, Zhang GP (2008) Trend time series modeling and forecasting with neural networks. IEEE Trans Neural Netw 19(5):808–816

Article Google Scholar

Samek D, Varachha P (2013) Time series prediction using artificial neural networks. International Journal of Mathematical Models and Methods in Applied Sciences 7(1):30–46

Google Scholar

Shadbolt J, Taylor JG (eds) (2002) Neural networks and the financial markets: predicting, combining and portfolio optimization. Springer-Verlag, London

Google Scholar

Trippi RR, Turban E (eds) (1992) Neural networks in finance and investing: using artificial intelligence to improve real world performance. McGraw-Hill, New York

Google Scholar

Walczak S (2001) An empirical analysis of data requirements for financial forecasting with neural networks. J Manage Inf Syst 17(4):203–222

Article Google Scholar

Y Kara, M, A Boyacioglu, O K, Baykan (2011) Predicting direction of stock Price index movement using artificial neural networks and support vector machines: the sample of the Istanbul stock exchange. Expert Syst Appl 38:5311–5319

Yodele et al (2012) Stock Price prediction using neural network with hybridized market indicators. Journal of Emerging Trends in Computing and Information Sciences 3(1):1–9

Google Scholar

Download references

### Acknowledgments

The authors are thankful to the Department of Mathematics, IIT Delhi for providing us the resources to perform this case study.

### Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

### Availability of data and materials

The data used is included with the submission of manuscript.

### Author information

### Affiliations

Department of Mathematics, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, 110016, India

Dharmaraja Selvamuthu, Vineet Kumar & Abhishek Mishra

### Contributions

We have no conflicts of interest to disclose. All the authors contributed equally to this work. All authors read and approved the final manuscript.

### Corresponding author

Correspondence to Dharmaraja Selvamuthu.

### Ethics declarations

### Competing interests

The authors declare that they have no competing interests.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

### Rights and permissions

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

### About this article

### Cite this article

Selvamuthu, D., Kumar, V. & Mishra, A. Indian stock market prediction using artificial neural networks on tick data. *Financ Innov***5, **16 (2019). https://doi.org/10.1186/s40854-019-0131-7

Download citation

### Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

### Keywords

- Neural Networks
- Indian Stock Market Prediction
- Levenberg-Marquardt
- Scale Conjugate Gradient
- Bayesian Regularization
- Tick by tick data

## Short-term stock market price trend prediction using a comprehensive deep learning system

*Journal of Big Data***volume 7**, Article number: 66 (2020) Cite this article

78k Accesses

2 Altmetric

Metrics details

### Abstract

In the era of big data, deep learning for predicting stock market prices and trends has become even more popular than before. We collected 2 years of data from Chinese stock market and proposed a comprehensive customization of feature engineering and deep learning-based model for predicting price trend of stock markets. The proposed solution is comprehensive as it includes pre-processing of the stock market dataset, utilization of multiple feature engineering techniques, combined with a customized deep learning based system for stock market price trend prediction. We conducted comprehensive evaluations on frequently used machine learning models and conclude that our proposed solution outperforms due to the comprehensive feature engineering that we built. The system achieves overall high accuracy for stock market trend prediction. With the detailed design and evaluation of prediction term lengths, feature engineering, and data pre-processing methods, this work contributes to the stock analysis research community both in the financial and technical domains.

### Introduction

Stock market is one of the major fields that investors are dedicated to, thus stock market price trend prediction is always a hot topic for researchers from both financial and technical domains. In this research, our objective is to build a state-of-art prediction model for price trend prediction, which focuses on short-term price trend prediction.

As concluded by Fama in [26], financial time series prediction is known to be a notoriously difficult task due to the generally accepted, semi-strong form of market efficiency and the high level of noise. Back in 2003, Wang et al. in [44] already applied artificial neural networks on stock market price prediction and focused on volume, as a specific feature of stock market. One of the key findings by them was that the volume was not found to be effective in improving the forecasting performance on the datasets they used, which was S&P 500 and DJI. Ince and Trafalis in [15] targeted short-term forecasting and applied support vector machine (SVM) model on the stock price prediction. Their main contribution is performing a comparison between multi-layer perceptron (MLP) and SVM then found that most of the scenarios SVM outperformed MLP, while the result was also affected by different trading strategies. In the meantime, researchers from financial domains were applying conventional statistical methods and signal processing techniques on analyzing stock market data.

The optimization techniques, such as principal component analysis (PCA) were also applied in short-term stock price prediction [22]. During the years, researchers are not only focused on stock price-related analysis but also tried to analyze stock market transactions such as volume burst risks, which expands the stock market analysis research domain broader and indicates this research domain still has high potential [39]. As the artificial intelligence techniques evolved in recent years, many proposed solutions attempted to combine machine learning and deep learning techniques based on previous approaches, and then proposed new metrics that serve as training features such as Liu and Wang [23]. This type of previous works belongs to the feature engineering domain and can be considered as the inspiration of feature extension ideas in our research. Liu et al. in [24] proposed a convolutional neural network (CNN) as well as a long short-term memory (LSTM) neural network based model to analyze different quantitative strategies in stock markets. The CNN serves for the stock selection strategy, automatically extracts features based on quantitative data, then follows an LSTM to preserve the time-series features for improving profits.

The latest work also proposes a similar hybrid neural network architecture, integrating a convolutional neural network with a bidirectional long short-term memory to predict the stock market index [4]. While the researchers frequently proposed different neural network solution architectures, it brought further discussions about the topic if the high cost of training such models is worth the result or not.

There are three key contributions of our work (1) a new dataset extracted and cleansed (2) a comprehensive feature engineering, and (3) a customized long short-term memory (LSTM) based deep learning model.

We have built the dataset by ourselves from the data source as an open-sourced data API called Tushare [43]. The novelty of our proposed solution is that we proposed a feature engineering along with a fine-tuned system instead of just an LSTM model only. We observe from the previous works and find the gaps and proposed a solution architecture with a comprehensive feature engineering procedure before training the prediction model. With the success of feature extension method collaborating with recursive feature elimination algorithms, it opens doors for many other machine learning algorithms to achieve high accuracy scores for short-term price trend prediction. It proved the effectiveness of our proposed feature extension as feature engineering. We further introduced our customized LSTM model and further improved the prediction scores in all the evaluation metrics. The proposed solution outperformed the machine learning and deep learning-based models in similar previous works.

The remainder of this paper is organized as follows. “Survey of related works” section describes the survey of related works. “The dataset” section provides details on the data that we extracted from the public data sources and the dataset prepared. “Methods” section presents the research problems, methods, and design of the proposed solution. Detailed technical design with algorithms and how the model implemented are also included in this section. “Results” section presents comprehensive results and evaluation of our proposed model, and by comparing it with the models used in most of the related works. “Discussion” section provides a discussion and comparison of the results. “Conclusion” section presents the conclusion. This research paper has been built based on Shen [36].

### Survey of related works

In this section, we discuss related works. We reviewed the related work in two different domains: technical and financial, respectively.

Kim and Han in [19] built a model as a combination of artificial neural networks (ANN) and genetic algorithms (GAs) with discretization of features for predicting stock price index. The data used in their study include the technical indicators as well as the direction of change in the daily Korea stock price index (KOSPI). They used the data containing samples of 2928 trading days, ranging from January 1989 to December 1998, and give their selected features and formulas. They also applied optimization of feature discretization, as a technique that is similar to dimensionality reduction. The strengths of their work are that they introduced GA to optimize the ANN. First, the amount of input features and processing elements in the hidden layer are 12 and not adjustable. Another limitation is in the learning process of ANN, and the authors only focused on two factors in optimization. While they still believed that GA has great potential for feature discretization optimization. Our initialized feature pool refers to the selected features. Qiu and Song in [34] also presented a solution to predict the direction of the Japanese stock market based on an optimized artificial neural network model. In this work, authors utilize genetic algorithms together with artificial neural network based models, and name it as a hybrid GA-ANN model.

Piramuthu in [33] conducted a thorough evaluation of different feature selection methods for data mining applications. He used for datasets, which were credit approval data, loan defaults data, web traffic data, tam, and kiang data, and compared how different feature selection methods optimized decision tree performance. The feature selection methods he compared included probabilistic distance measure: the Bhattacharyya measure, the Matusita measure, the divergence measure, the Mahalanobis distance measure, and the Patrick-Fisher measure. For inter-class distance measures: the Minkowski distance measure, city block distance measure, Euclidean distance measure, the Chebychev distance measure, and the nonlinear (Parzen and hyper-spherical kernel) distance measure. The strength of this paper is that the author evaluated both probabilistic distance-based and several inter-class feature selection methods. Besides, the author performed the evaluation based on different datasets, which reinforced the strength of this paper. However, the evaluation algorithm was a decision tree only. We cannot conclude if the feature selection methods will still perform the same on a larger dataset or a more complex model.

Hassan and Nath in [9] applied the Hidden Markov Model (HMM) on the stock market forecasting on stock prices of four different Airlines. They reduce states of the model into four states: the opening price, closing price, the highest price, and the lowest price. The strong point of this paper is that the approach does not need expert knowledge to build a prediction model. While this work is limited within the industry of Airlines and evaluated on a very small dataset, it may not lead to a prediction model with generality. One of the approaches in stock market prediction related works could be exploited to do the comparison work. The authors selected a maximum 2 years as the date range of training and testing dataset, which provided us a date range reference for our evaluation part.

Lei in [21] exploited Wavelet Neural Network (WNN) to predict stock price trends. The author also applied Rough Set (RS) for attribute reduction as an optimization. Rough Set was utilized to reduce the stock price trend feature dimensions. It was also used to determine the structure of the Wavelet Neural Network. The dataset of this work consists of five well-known stock market indices, i.e., (1) SSE Composite Index (China), (2) CSI 300 Index (China), (3) All Ordinaries Index (Australian), (4) Nikkei 225 Index (Japan), and (5) Dow Jones Index (USA). Evaluation of the model was based on different stock market indices, and the result was convincing with generality. By using Rough Set for optimizing the feature dimension before processing reduces the computational complexity. However, the author only stressed the parameter adjustment in the discussion part but did not specify the weakness of the model itself. Meanwhile, we also found that the evaluations were performed on indices, the same model may not have the same performance if applied on a specific stock.

Lee in [20] used the support vector machine (SVM) along with a hybrid feature selection method to carry out prediction of stock trends. The dataset in this research is a sub dataset of NASDAQ Index in Taiwan Economic Journal Database (TEJD) in 2008. The feature selection part was using a hybrid method, supported sequential forward search (SSFS) played the role of the wrapper. Another advantage of this work is that they designed a detailed procedure of parameter adjustment with performance under different parameter values. The clear structure of the feature selection model is also heuristic to the primary stage of model structuring. One of the limitations was that the performance of SVM was compared to back-propagation neural network (BPNN) only and did not compare to the other machine learning algorithms.

Sirignano and Cont leveraged a deep learning solution trained on a universal feature set of financial markets in [40]. The dataset used included buy and sell records of all transactions, and cancellations of orders for approximately 1000 NASDAQ stocks through the order book of the stock exchange. The NN consists of three layers with LSTM units and a feed-forward layer with rectified linear units (ReLUs) at last, with stochastic gradient descent (SGD) algorithm as an optimization. Their universal model was able to generalize and cover the stocks other than the ones in the training data. Though they mentioned the advantages of a universal model, the training cost was still expensive. Meanwhile, due to the inexplicit programming of the deep learning algorithm, it is unclear that if there are useless features contaminated when feeding the data into the model. Authors found out that it would have been better if they performed feature selection part before training the model and found it as an effective way to reduce the computational complexity.

Ni et al. in [30] predicted stock price trends by exploiting SVM and performed fractal feature selection for optimization. The dataset they used is the Shanghai Stock Exchange Composite Index (SSECI), with 19 technical indicators as features. Before processing the data, they optimized the input data by performing feature selection. When finding the best parameter combination, they also used a grid search method, which is k cross-validation. Besides, the evaluation of different feature selection methods is also comprehensive. As the authors mentioned in their conclusion part, they only considered the technical indicators but not macro and micro factors in the financial domain. The source of datasets that the authors used was similar to our dataset, which makes their evaluation results useful to our research. They also mentioned a method called k cross-validation when testing hyper-parameter combinations.

McNally et al. in [27] leveraged RNN and LSTM on predicting the price of Bitcoin, optimized by using the Boruta algorithm for feature engineering part, and it works similarly to the random forest classifier. Besides feature selection, they also used Bayesian optimization to select LSTM parameters. The Bitcoin dataset ranged from the 19th of August 2013 to 19th of July 2016. Used multiple optimization methods to improve the performance of deep learning methods. The primary problem of their work is overfitting. The research problem of predicting Bitcoin price trend has some similarities with stock market price prediction. Hidden features and noises embedded in the price data are threats of this work. The authors treated the research question as a time sequence problem. The best part of this paper is the feature engineering and optimization part; we could replicate the methods they exploited in our data pre-processing.

Weng et al. in [45] focused on short-term stock price prediction by using ensemble methods of four well-known machine learning models. The dataset for this research is five sets of data. They obtained these datasets from three open-sourced APIs and an R package named TTR. The machine learning models they used are (1) neural network regression ensemble (NNRE), (2) a Random Forest with unpruned regression trees as base learners (RFR), (3) AdaBoost with unpruned regression trees as base learners (BRT) and (4) a support vector regression ensemble (SVRE). A thorough study of ensemble methods specified for short-term stock price prediction. With background knowledge, the authors selected eight technical indicators in this study then performed a thoughtful evaluation of five datasets. The primary contribution of this paper is that they developed a platform for investors using R, which does not need users to input their own data but call API to fetch the data from online source straightforward. From the research perspective, they only evaluated the prediction of the price for 1 up to 10 days ahead but did not evaluate longer terms than two trading weeks or a shorter term than 1 day. The primary limitation of their research was that they only analyzed 20 U.S.-based stocks, the model might not be generalized to other stock market or need further revalidation to see if it suffered from overfitting problems.

Kara et al. in [17] also exploited ANN and SVM in predicting the movement of stock price index. The data set they used covers a time period from January 2, 1997, to December 31, 2007, of the Istanbul Stock Exchange. The primary strength of this work is its detailed record of parameter adjustment procedures. While the weaknesses of this work are that neither the technical indicator nor the model structure has novelty, and the authors did not explain how their model performed better than other models in previous works. Thus, more validation works on other datasets would help. They explained how ANN and SVM work with stock market features, also recorded the parameter adjustment. The implementation part of our research could benefit from this previous work.

Jeon et al. in [16] performed research on millisecond interval-based big dataset by using pattern graph tracking to complete stock price prediction tasks. The dataset they used is a millisecond interval-based big dataset of historical stock data from KOSCOM, from August 2014 to October 2014, 10G–15G capacity. The author applied Euclidean distance, Dynamic Time Warping (DTW) for pattern recognition. For feature selection, they used stepwise regression. The authors completed the prediction task by ANN and Hadoop and RHive for big data processing. The “Results” section is based on the result processed by a combination of SAX and Jaro–Winkler distance. Before processing the data, they generated aggregated data at 5-min intervals from discrete data. The primary strength of this work is the explicit structure of the whole implementation procedure. While they exploited a relatively old model, another weakness is the overall time span of the training dataset is extremely short. It is difficult to access the millisecond interval-based data in real life, so the model is not as practical as a daily based data model.

Huang et al. in [12] applied a fuzzy-GA model to complete the stock selection task. They used the key stocks of the 200 largest market capitalization listed as the investment universe in the Taiwan Stock Exchange. Besides, the yearly financial statement data and the stock returns were taken from the Taiwan Economic Journal (TEJ) database at www.tej.com.tw/ for the time period from year 1995 to year 2009. They conducted the fuzzy membership function with model parameters optimized with GA and extracted features for optimizing stock scoring. The authors proposed an optimized model for selection and scoring of stocks. Different from the prediction model, the authors more focused on stock rankings, selection, and performance evaluation. Their structure is more practical among investors. But in the model validation part, they did not compare the model with existed algorithms but the statistics of the benchmark, which made it challenging to identify if GA would outperform other algorithms.

Fischer and Krauss in [5] applied long short-term memory (LSTM) on financial market prediction. The dataset they used is S&P 500 index constituents from Thomson Reuters. They obtained all month-end constituent lists for the S&P 500 from Dec 1989 to Sep 2015, then consolidated the lists into a binary matrix to eliminate survivor bias. The authors also used RMSprop as an optimizer, which is a mini-batch version of rprop. The primary strength of this work is that the authors used the latest deep learning technique to perform predictions. They relied on the LSTM technique, lack of background knowledge in the financial domain. Although the LSTM outperformed the standard DNN and logistic regression algorithms, while the author did not mention the effort to train an LSTM with long-time dependencies.

Tsai and Hsiao in [42] proposed a solution as a combination of different feature selection methods for prediction of stocks. They used Taiwan Economic Journal (TEJ) database as data source. The data used in their analysis was from year 2000 to 2007. In their work, they used a sliding window method and combined it with multi layer perceptron (MLP) based artificial neural networks with back propagation, as their prediction model. In their work, they also applied principal component analysis (PCA) for dimensionality reduction, genetic algorithms (GA) and the classification and regression trees (CART) to select important features. They did not just rely on technical indices only. Instead, they also included both fundamental and macroeconomic indices in their analysis. The authors also reported a comparison on feature selection methods. The validation part was done by combining the model performance stats with statistical analysis.

Pimenta et al. in [32] leveraged an automated investing method by using multi-objective genetic programming and applied it in the stock market. The dataset was obtained from Brazilian stock exchange market (BOVESPA), and the primary techniques they exploited were a combination of multi-objective optimization, genetic programming, and technical trading rules. For optimization, they leveraged genetic programming (GP) to optimize decision rules. The novelty of this paper was in the evaluation part. They included a historical period, which was a critical moment of Brazilian politics and economics when performing validation. This approach reinforced the generalization strength of their proposed model. When selecting the sub-dataset for evaluation, they also set criteria to ensure more asset liquidity. While the baseline of the comparison was too basic and fundamental, and the authors did not perform any comparison with other existing models.

Huang and Tsai in [13] conducted a filter-based feature selection assembled with a hybrid self-organizing feature map (SOFM) support vector regression (SVR) model to forecast Taiwan index futures (FITX) trend. They divided the training samples into clusters to marginally improve the training efficiency. The authors proposed a comprehensive model, which was a combination of two novel machine learning techniques in stock market analysis. Besides, the optimizer of feature selection was also applied before the data processing to improve the prediction accuracy and reduce the computational complexity of processing daily stock index data. Though they optimized the feature selection part and split the sample data into small clusters, it was already strenuous to train daily stock index data of this model. It would be difficult for this model to predict trading activities in shorter time intervals since the data volume would be increased drastically. Moreover, the evaluation is not strong enough since they set a single SVR model as a baseline, but did not compare the performance with other previous works, which caused difficulty for future researchers to identify the advantages of SOFM-SVR model why it outperforms other algorithms.

Thakur and Kumar in [41] also developed a hybrid financial trading support system by exploiting multi-category classifiers and random forest (RAF). They conducted their research on stock indices from NASDAQ, DOW JONES, S&P 500, NIFTY 50, and NIFTY BANK. The authors proposed a hybrid model combined random forest (RF) algorithms with a weighted multicategory generalized eigenvalue support vector machine (WMGEPSVM) to generate “Buy/Hold/Sell” signals. Before processing the data, they used Random Forest (RF) for feature pruning. The authors proposed a practical model designed for real-life investment activities, which could generate three basic signals for investors to refer to. They also performed a thorough comparison of related algorithms. While they did not mention the time and computational complexity of their works. Meanwhile, the unignorable issue of their work was the lack of financial domain knowledge background. The investors regard the indices data as one of the attributes but could not take the signal from indices to operate a specific stock straightforward.

Hsu in [11] assembled feature selection with a back propagation neural network (BNN) combined with genetic programming to predict the stock/futures price. The dataset in this research was obtained from Taiwan Stock Exchange Corporation (TWSE). The authors have introduced the description of the background knowledge in detail. While the weakness of their work is that it is a lack of data set description. This is a combination of the model proposed by other previous works. Though we did not see the novelty of this work, we can still conclude that the genetic programming (GP) algorithm is admitted in stock market research domain. To reinforce the validation strengths, it would be good to consider adding GP models into evaluation if the model is predicting a specific price.

Hafezi et al. in [7] built a bat-neural network multi-agent system (BN-NMAS) to predict stock price. The dataset was obtained from the Deutsche bundes-bank. They also applied the Bat algorithm (BA) for optimizing neural network weights. The authors illustrated their overall structure and logic of system design in clear flowcharts. While there were very few previous works that had performed on DAX data, it would be difficult to recognize if the model they proposed still has the generality if migrated on other datasets. The system design and feature selection logic are fascinating, which worth referring to. Their findings in optimization algorithms are also valuable for the research in the stock market price prediction research domain. It is worth trying the Bat algorithm (BA) when constructing neural network models.

Long et al. in [25] conducted a deep learning approach to predict the stock price movement. The dataset they used is the Chinese stock market index CSI 300. For predicting the stock price movement, they constructed a multi-filter neural network (MFNN) with stochastic gradient descent (SGD) and back propagation optimizer for learning NN parameters. The strength of this paper is that the authors exploited a novel model with a hybrid model constructed by different kinds of neural networks, it provides an inspiration for constructing hybrid neural network structures.

Atsalakis and Valavanis in [1] proposed a solution of a neuro-fuzzy system, which is composed of controller named as Adaptive Neuro Fuzzy Inference System (ANFIS), to achieve short-term stock price trend prediction. The noticeable strength of this work is the evaluation part. Not only did they compare their proposed system with the popular data models, but also compared with investment strategies. While the weakness that we found from their proposed solution is that their solution architecture is lack of optimization part, which might limit their model performance. Since our proposed solution is also focusing on short-term stock price trend prediction, this work is heuristic for our system design. Meanwhile, by comparing with the popular trading strategies from investors, their work inspired us to compare the strategies used by investors with techniques used by researchers.

Nekoeiqachkanloo et al. in [29] proposed a system with two different approaches for stock investment. The strengths of their proposed solution are obvious. First, it is a comprehensive system that consists of data pre-processing and two different algorithms to suggest the best investment portions. Second, the system also embedded with a forecasting component, which also retains the features of the time series. Last but not least, their input features are a mix of fundamental features and technical indices that aim to fill in the gap between the financial domain and technical domain. However, their work has a weakness in the evaluation part. Instead of evaluating the proposed system on a large dataset, they chose 25 well-known stocks. There is a high possibility that the well-known stocks might potentially share some common hidden features.

As another related latest work, Idrees et al. [14] published a time series-based prediction approach for the volatility of the stock market. ARIMA is not a new approach in the time series prediction research domain. Their work is more focusing on the feature engineering side. Before feeding the features into ARIMA models, they designed three steps for feature engineering: Analyze the time series, identify if the time series is stationary or not, perform estimation by plot ACF and PACF charts and look for parameters. The only weakness of their proposed solution is that the authors did not perform any customization on the existing ARIMA model, which might limit the system performance to be improved.

One of the main weaknesses found in the related works is limited data-preprocessing mechanisms built and used. Technical works mostly tend to focus on building prediction models. When they select the features, they list all the features mentioned in previous works and go through the feature selection algorithm then select the best-voted features. Related works in the investment domain have shown more interest in behavior analysis, such as how herding behaviors affect the stock performance, or how the percentage of inside directors hold the firm’s common stock affects the performance of a certain stock. These behaviors often need a pre-processing procedure of standard technical indices and investment experience to recognize.

In the related works, often a thorough statistical analysis is performed based on a special dataset and conclude new features rather than performing feature selections. Some data, such as the percentage of a certain index fluctuation has been proven to be effective on stock performance. We believe that by extracting new features from data, then combining such features with existed common technical indices will significantly benefit the existing and well-tested prediction models.

### The dataset

This section details the data that was extracted from the public data sources, and the final dataset that was prepared. Stock market-related data are diverse, so we first compared the related works from the survey of financial research works in stock market data analysis to specify the data collection directions. After collecting the data, we defined a data structure of the dataset. Given below, we describe the dataset in detail, including the data structure, and data tables in each category of data with the segment definitions.

### Description of our dataset

In this section, we will describe the dataset in detail. This dataset consists of 3558 stocks from the Chinese stock market. Besides the daily price data, daily fundamental data of each stock ID, we also collected the suspending and resuming history, top 10 shareholders, etc. We list two reasons that we choose 2 years as the time span of this dataset: (1) most of the investors perform stock market price trend analysis using the data within the latest 2 years, (2) using more recent data would benefit the analysis result. We collected data through the open-sourced API, namely Tushare [43], mean-while we also leveraged a web-scraping technique to collect data from Sina Finance web pages, SWS Research website.

### Data structure

Figure 1 illustrates all the data tables in the dataset. We collected four categories of data in this dataset: (1) basic data, (2) trading data, (3) finance data, and (4) other reference data. All the data tables can be linked to each other by a common field called “Stock ID” It is a unique stock identifier registered in the Chinese Stock market. Table 1 shows an overview of the dataset.

The Table 1 lists the field information of each data table as well as which category the data table belongs to.

### Methods

In this section, we present the proposed methods and the design of the proposed solution. Moreover, we also introduce the architecture design as well as algorithmic and implementation details.

### Problem statement

We analyzed the best possible approach for predicting short-term price trends from different aspects: feature engineering, financial domain knowledge, and prediction algorithm. Then we addressed three research questions in each aspect, respectively: How can feature engineering benefit model prediction accuracy? How do findings from the financial domain benefit prediction model design? And what is the best algorithm for predicting short-term price trends?

The first research question is about feature engineering. We would like to know how the feature selection method benefits the performance of prediction models. From the abundance of the previous works, we can conclude that stock price data embedded with a high level of noise, and there are also correlations between features, which makes the price prediction notoriously difficult. That is also the primary reason for most of the previous works introduced the feature engineering part as an optimization module.

The second research question is evaluating the effectiveness of findings we extracted from the financial domain. Different from the previous works, besides the common evaluation of data models such as the training costs and scores, our evaluation will emphasize the effectiveness of newly added features that we extracted from the financial domain. We introduce some features from the financial domain. While we only obtained some specific findings from previous works, and the related raw data needs to be processed into usable features. After extracting related features from the financial domain, we combine the features with other common technical indices for voting out the features with a higher impact. There are numerous features said to be effective from the financial domain, and it would be impossible for us to cover all of them. Thus, how to appropriately convert the findings from the financial domain to a data processing module of our system design is a hidden research question that we attempt to answer.

The third research question is that which algorithms are we going to model our data? From the previous works, researchers have been putting efforts into the exact price prediction. We decompose the problem into predicting the trend and then the exact number. This paper focuses on the first step. Hence, the objective has been converted to resolve a binary classification problem, meanwhile, finding an effective way to eliminate the negative effect brought by the high level of noise. Our approach is to decompose the complex problem into sub-problems which have fewer dependencies and resolve them one by one, and then compile the resolutions into an ensemble model as an aiding system for investing behavior reference.

In the previous works, researchers have been using a variety of models for predicting stock price trends. While most of the best-performed models are based on machine learning techniques, in this work, we will compare our approach with the outperformed machine learning models in the evaluation part and find the solution for this research question.

### Proposed solution

The high-level architecture of our proposed solution could be separated into three parts. First is the feature selection part, to guarantee the selected features are highly effective. Second, we look into the data and perform the dimensionality reduction. And the last part, which is the main contribution of our work is to build a prediction model of target stocks. Figure 2 depicts a high-level architecture of the proposed solution.

There are ways to classify different categories of stocks. Some investors prefer long-term investments, while others show more interest in short-term investments. It is common to see the stock-related reports showing an average performance, while the stock price is increasing drastically; this is one of the phenomena that indicate the stock price prediction has no fixed rules, thus finding effective features before training a model on data is necessary.

In this research, we focus on the short-term price trend prediction. Currently, we only have the raw data with no labels. So, the very first step is to label the data. We mark the price trend by comparing the current closing price with the closing price of n trading days ago, the range of n is from 1 to 10 since our research is focusing on the short-term. If the price trend goes up, we mark it as 1 or mark as 0 in the opposite case. To be more specified, we use the indices from the indices of *n *− *1*_{th} day to predict the price trend of the *n*_{th} day.

According to the previous works, some researchers who applied both financial domain knowledge and technical methods on stock data were using rules to filter the high-quality stocks. We referred to their works and exploited their rules to contribute to our feature extension design.

However, to ensure the best performance of the prediction model, we will look into the data first. There are a large number of features in the raw data; if we involve all the features into our consideration, it will not only drastically increase the computational complexity but will also cause side effects if we would like to perform unsupervised learning in further research. So, we leverage the recursive feature elimination (RFE) to ensure all the selected features are effective.

We found most of the previous works in the technical domain were analyzing all the stocks, while in the financial domain, researchers prefer to analyze the specific scenario of investment, to fill the gap between the two domains, we decide to apply a feature extension based on the findings we gathered from the financial domain before we start the RFE procedure.

Since we plan to model the data into time series, the number of the features, the more complex the training procedure will be. So, we will leverage the dimensionality reduction by using randomized PCA at the beginning of our proposed solution architecture.

### Detailed technical design elaboration

This section provides an elaboration of the detailed technical design as being a comprehensive solution based on utilizing, combining, and customizing several existing data preprocessing, feature engineering, and deep learning techniques. Figure 3 provides the detailed technical design from data processing to prediction, including the data exploration. We split the content by main procedures, and each procedure contains algorithmic steps. Algorithmic details are elaborated in the next section. The contents of this section will focus on illustrating the data workflow.

Based on the literature review, we select the most commonly used technical indices and then feed them into the feature extension procedure to get the expanded feature set. We will select the most effective *i* features from the expanded feature set. Then we will feed the data with *i* selected features into the PCA algorithm to reduce the dimension into *j* features. After we get the best combination of *i* and *j*, we process the data into finalized the feature set and feed them into the LSTM [10] model to get the price trend prediction result.

The novelty of our proposed solution is that we will not only apply the technical method on raw data but also carry out the feature extensions that are used among stock market investors. Details on feature extension are given in the next subsection. Experiences gained from applying and optimizing deep learning based solutions in [37, 38] were taken into account while designing and customizing feature engineering and deep learning solution in this work.

#### Applying feature extension

The first main procedure in Fig. 3 is the feature extension. In this block, the input data is the most commonly used technical indices concluded from related works. The three feature extension methods are max–min scaling, polarizing, and calculating fluctuation percentage. Not all the technical indices are applicable for all three of the feature extension methods; this procedure only applies the meaningful extension methods on technical indices. We choose meaningful extension methods while looking at how the indices are calculated. The technical indices and the corresponding feature extension methods are illustrated in Table 2.

After the feature extension procedure, the expanded features will be combined with the most commonly used technical indices, i.e., input data with output data, and feed into RFE block as input data in the next step.

#### Applying recursive feature elimination

After the feature extension above, we explore the most effective *i* features by using the Recursive Feature Elimination (RFE) algorithm [6]. We estimate all the features by two attributes, coefficient, and feature importance. We also limit the features that remove from the pool by one, which means we will remove one feature at each step and retain all the relevant features. Then the output of the RFE block will be the input of the next step, which refers to PCA.

#### Applying principal component analysis (PCA)

The very first step before leveraging PCA is feature pre-processing. Because some of the features after RFE are percentage data, while others are very large numbers, i.e., the output from RFE are in different units. It will affect the principal component extraction result. Thus, before feeding the data into the PCA algorithm [8], a feature pre-processing is necessary. We also illustrate the effectiveness and methods comparison in “Results” section.

After performing feature pre-processing, the next step is to feed the processed data with selected *i* features into the PCA algorithm to reduce the feature matrix scale into *j* features. This step is to retain as many effective features as possible and meanwhile eliminate the computational complexity of training the model. This research work also evaluates the best combination of *i* and *j,* which has relatively better prediction accuracy, meanwhile, cuts the computational consumption. The result can be found in the “Results” section, as well. After the PCA step, the system will get a reshaped matrix with *j* columns.

#### Fitting long short-term memory (LSTM) model

PCA reduced the dimensions of the input data, while the data pre-processing is mandatory before feeding the data into the LSTM layer. The reason for adding the data pre-processing step before the LSTM model is that the input matrix formed by principal components has no time steps. While one of the most important parameters of training an LSTM is the number of time steps. Hence, we have to model the matrix into corresponding time steps for both training and testing dataset.

After performing the data pre-processing part, the last step is to feed the training data into LSTM and evaluate the performance using testing data. As a variant neural network of RNN, even with one LSTM layer, the NN structure is still a deep neural network since it can process sequential data and memorizes its hidden states through time. An LSTM layer is composed of one or more LSTM units, and an LSTM unit consists of cells and gates to perform classification and prediction based on time series data.

The LSTM structure is formed by two layers. The input dimension is determined by j after the PCA algorithm. The first layer is the input LSTM layer, and the second layer is the output layer. The final output will be 0 or 1 indicates if the stock price trend prediction result is going down or going up, as a supporting suggestion for the investors to perform the next investment decision.

#### Design discussion

Feature extension is one of the novelties of our proposed price trend predicting system. In the feature extension procedure, we use technical indices to collaborate with the heuristic processing methods learned from investors, which fills the gap between the financial research area and technical research area.

Since we proposed a system of price trend prediction, feature engineering is extremely important to the final prediction result. Not only the feature extension method is helpful to guarantee we do not miss the potentially correlated feature, but also feature selection method is necessary for pooling the effective features. The more irrelevant features are fed into the model, the more noise would be introduced. Each main procedure is carefully considered contributing to the whole system design.

Besides the feature engineering part, we also leverage LSTM, the state-of-the-art deep learning method for time-series prediction, which guarantees the prediction model can capture both complex hidden pattern and the time-series related pattern.

It is known that the training cost of deep learning models is expansive in both time and hardware aspects; another advantage of our system design is the optimization procedure—PCA. It can retain the principal components of the features while reducing the scale of the feature matrix, thus help the system to save the training cost of processing the large time-series feature matrix.

#### Algorithm elaboration

This section provides comprehensive details on the algorithms we built while utilizing and customizing different existing techniques. Details about the terminologies, parameters, as well as optimizers. From the legend on the right side of Fig. 3, we note the algorithm steps as octagons, all of them can be found in this “Algorithm elaboration” section.

Before dive deep into the algorithm steps, here is the brief introduction of data pre-processing: since we will go through the supervised learning algorithms, we also need to program the ground truth. The ground truth of this research is programmed by comparing the closing price of the current trading date with the closing price of the previous trading date the users want to compare with. Label the price increase as 1, else the ground truth will be labeled as 0. Because this research work is not only focused on predicting the price trend of a specific period of time but short-term in general, the ground truth processing is according to a range of trading days. While the algorithms will not change with the prediction term length, we can regard the term length as a parameter.

The algorithmic detail is elaborated, respectively, the first algorithm is the hybrid feature engineering part for preparing high-quality training and testing data. It corresponds to the Feature extension, RFE, and PCA blocks in Fig. 3. The second algorithm is the LSTM procedure block, including time-series data pre-processing, NN constructing, training, and testing.

#### Algorithm 1: Short-term stock market price trend prediction—applying feature engineering using FE + RFE + PCA

The function FE is corresponding to the feature extension block. For the feature extension procedure, we apply three different processing methods to translate the findings from the financial domain to a technical module in our system design. While not all the indices are applicable for expanding, we only choose the proper method(s) for certain features to perform the feature extension (FE), according to Table 2.

Normalize method preserves the relative frequencies of the terms, and transform the technical indices into the range of [0, 1]. Polarize is a well-known method often used by real-world investors, sometimes they prefer to consider if the technical index value is above or below zero, we program some of the features using polarize method and prepare for RFE. Max-min (or min-max) [35] scaling is a transformation method often used as an alternative to zero mean and unit variance scaling. Another well-known method used is fluctuation percentage, and we transform the technical indices fluctuation percentage into the range of [− 1, 1].

The function RFE () in the first algorithm refers to recursive feature elimination. Before we perform the training data scale reduction, we will have to make sure that the features we selected are effective. Ineffective features will not only drag down the classification precision but also add more computational complexity. For the feature selection part, we choose recursive feature elimination (RFE). As [45] explained, the process of recursive feature elimination can be split into the ranking algorithm, resampling, and external validation.

For the ranking algorithm, it fits the model to the features and ranks by the importance to the model. We set the parameter to retain *i* numbers of features, and at each iteration of feature selection retains *Si* top-ranked features, then refit the model and assess the performance again to begin another iteration. The ranking algorithm will eventually determine the top *Si* features.

The RFE algorithm is known to have suffered from the over-fitting problem. To eliminate the over-fitting issue, we will run the RFE algorithm multiple times on randomly selected stocks as the training set and ensure all the features we select are high-weighted. This procedure is called data resampling. Resampling can be built as an optimization step as an outer layer of the RFE algorithm.

The last part of our hybrid feature engineering algorithm is for optimization purposes. For the training data matrix scale reduction, we apply Randomized principal component analysis (PCA) [31], before we decide the features of the classification model.

Financial ratios of a listed company are used to present the growth ability, earning ability, solvency ability, etc. Each financial ratio consists of a set of technical indices, each time we add a technical index (or feature) will add another column of data into the data matrix and will result in low training efficiency and redundancy. If non-relevant or less relevant features are included in training data, it will also decrease the precision of classification.

The above equation represents the explanation power of principal components extracted by PCA method for original data. If an ACR is below 85%, the PCA method would be unsuitable due to a loss of original information. Because the covariance matrix is sensitive to the order of magnitudes of data, there should be a data standardize procedure before performing the PCA. The commonly used standardized methods are mean-standardization and normal-standardization and are noted as given below:

Mean-standardization: \(X_{ij}^{*} = X_{ij} /\overline{{X_{j} }}\), which \(\overline{{X_{j} }}\) represents the mean value.

Normal-standardization: \(X_{ij}^{*} = (X_{ij} - \overline{{X_{j} }} )/s_{j}\), which \(\overline{{X_{j} }}\) represents the mean value, and \(s_{j}\) is the standard deviation.

The array *fe_array* is defined according to Table 2, row number maps to the features, columns 0, 1, 2, 3 note for the extension methods of normalize, polarize, max–min scale, and fluctuation percentage, respectively. Then we fill in the values for the array by the rule where 0 stands for no necessity to expand and 1 for features need to apply the corresponding extension methods. The final algorithm of data preprocessing using RFE and PCA can be illustrated as Algorithm 1.

#### Algorithm 2: Price trend prediction model using LSTM

After the principal component extraction, we will get the scale-reduced matrix, which means *i* most effective features are converted into *j* principal components for training the prediction model. We utilized an LSTM model and added a conversion procedure for our stock price dataset. The detailed algorithm design is illustrated in Alg 2. The function *TimeSeriesConversion* () converts the principal components matrix into time series by shifting the input data frame according to the number of time steps [3], i.e., term length in this research. The processed dataset consists of the input sequence and forecast sequence. In this research, the parameter of *LAG* is 1, because the model is detecting the pattern of features fluctuation on a daily basis. Meanwhile, the *N_TIME_STEPS* is varied from 1 trading day to 10 trading days. The functions *DataPartition (), FitModel (), EvaluateModel ()* are regular steps without customization. The NN structure design, optimizer decision, and other parameters are illustrated in function *ModelCompile ()*.

### Results

Some procedures impact the efficiency but do not affect the accuracy or precision and vice versa, while other procedures may affect both efficiency and prediction result. To fully evaluate our algorithm design, we structure the evaluation part by main procedures and evaluate how each procedure affects the algorithm performance. First, we evaluated our solution on a machine with 2.2 GHz i7 processor, with 16 GB of RAM. Furthermore, we also evaluated our solution on Amazon EC2 instance, 3.1 GHz Processor with 16 vCPUs, and 64 GB RAM.

In the implementation part, we expanded 20 features into 54 features, while we retain 30 features that are the most effective. In this section, we discuss the evaluation of feature selection. The dataset was divided into two different subsets, i.e., training and testing datasets. Test procedure included two parts, one testing dataset is for feature selection, and another one is for model testing. We note the feature selection dataset and model testing dataset as DS_test_f and DS_test_m, respectively.

We randomly selected two-thirds of the stock data by stock ID for RFE training and note the dataset as DS_train_f; all the data consist of full technical indices and expanded features throughout 2018. The estimator of the RFE algorithm is SVR with linear kernels. We rank the 54 features by voting and get 30 effective features then process them using the PCA algorithm to perform dimension reduction and reduce the features into 20 principal components. The rest of the stock data forms the testing dataset DS_test_f to validate the effectiveness of principal components we extracted from selected features. We reformed all the data from 2018 as the training dataset of the data model and noted as DS_train_m. The model testing dataset DS_test_m consists of the first 3 months of data in 2019, which has no overlap with the dataset we utilized in the previous steps. This approach is to prevent the hidden problem caused by overfitting.

### Term length

To build an efficient prediction model, instead of the approach of modeling the data to time series, we determined to use 1 day ahead indices data to predict the price trend of the next day. We tested the RFE algorithm on a range of short-term from 1 day to 2 weeks (ten trading days), to evaluate how the commonly used technical indices correlated to price trends. For evaluating the prediction term length, we fully expanded the features as Table 2, and feed them to RFE. During the test, we found that different length of the term has a different level of sensitive-ness to the same indices set.

We get the close price of the first trading date and compare it with the close price of the *n*_*th* trading date. Since we are predicting the price trend, we do not consider the term lengths if the cross-validation score is below 0.5. And after the test, as we can see from Fig. 4, there are three-term lengths that are most sensitive to the indices we selected from the related works. They are *n *= {2, 5, 10}, which indicates that price trend prediction of every other day, 1 week, and 2 weeks using the indices set are likely to be more reliable.

While these curves have different patterns, for the length of 2 weeks, the cross-validation score increases with the number of features selected. If the prediction term length is 1 week, the cross-validation score will decrease if selected over 8 features. For every other day price trend prediction, the best cross-validation score is achieved by selecting 48 features. Biweekly prediction requires 29 features to achieve the best score. In Table 3, we listed the top 15 effective features for these three-period lengths. If we predict the price trend of every other day, the cross-validation score merely fluctuates with the number of features selected. So, in the next step, we will evaluate the RFE result for these three-term lengths, as shown in Fig. 4.

We compare the output feature set of RFE with the all-original feature set as a baseline, the all-original feature set consists of n features and we choose *n* most effective features from RFE output features to evaluate the result using linear SVR. We used two different approaches to evaluate feature effectiveness. The first method is to combine all the data into one large matrix and evaluate them by running the RFE algorithm once. Another method is to run RFE for each individual stock and calculate the most effective features by voting.

### Feature extension and RFE

From the result of the previous subsection, we can see that when predicting the price trend for every other day or biweekly, the best result is achieved by selecting a large number of features. Within the selected features, some features processed from extension methods have better ranks than original features, which proves that the feature extension method is useful for optimizing the model. The feature extension affects both precision and efficiency, while in this part, we only discuss the precision aspect and leave efficiency part in the next step since PCA is the most effective method for training efficiency optimization in our design. We involved an evaluation of how feature extension affects RFE and use the test result to measure the improvement of involving feature extension.

We further test the effectiveness of feature extension, i.e., if polarize, max–min scale, and calculate fluctuation percentage works better than original technical indices. The best case to leverage this test is the weekly prediction since it has the least effective feature selected. From the result we got from the last section, we know the best cross-validation score appears when selecting 8 features. The test consists of two steps, and the first step is to test the feature set formed by original features only, in this case, only SLOWK, SLOWD, and RSI_5 are included. The next step is to test the feature set of all 8 features we selected in the previous subsection. We leveraged the test by defining the simplest DNN model with three layers.

The normalized confusion matrix of testing the two feature sets are illustrated in Fig. 5. The left one is the confusion matrix of the feature set with expanded features, and the right one besides is the test result of using original features only. Both precisions of true positive and true negative have been improved by 7% and 10%, respectively, which proves that our feature extension method design is reasonably effective.

### Feature reduction using principal component analysis

PCA will affect the algorithm performance on both prediction accuracy and training efficiency, while this part should be evaluated with the NN model, so we also defined the simplest DNN model with three layers as we used in the previous step to perform the evaluation. This part introduces the evaluation method and result of the optimization part of the model from computational efficiency and accuracy impact perspectives.

In this section, we will choose bi-weekly prediction to perform a use case analysis, since it has a smoothly increasing cross-validation score curve, moreover, unlike every other day prediction, it has excluded more than 20 ineffective features already. In the first step, we select all 29 effective features and train the NN model without performing PCA. It creates a baseline of the accuracy and training time for comparison. To evaluate the accuracy and efficiency, we keep the number of the principal component as 5, 10, 15, 20, 25. Table 4 recorded how the number of features affects the model training efficiency, then uses the stack bar chart in Fig. 6 to illustrate how PCA affects training efficiency. Table 6 shows accuracy and efficiency analysis on different procedures for the pre-processing of features. The times taken shown in Tables 4, 6 are based on experiments conducted in a standard user machine to show the viability of our solution with limited or average resource availability.

We also listed the confusion matrix of each test in Fig. 7. The stack bar chart shows that the overall time spends on training the model is decreasing by the number of selected features, while the PCA method is significantly effective in optimizing training dataset preparation. For the time spent on the training stage, PCA is not as effective as the data preparation stage. While there is the possibility that the optimization effect of PCA is not drastic enough because of the simple structure of the NN model.

Table 5 indicates that the overall prediction accuracy is not drastically affected by reducing the dimension. However, the accuracy could not fully support if the PCA has no side effect to model prediction, so we looked into the confusion matrices of test results.

From Fig. 7 we can conclude that PCA does not have a severe negative impact on prediction precision. The true positive rate and false positive rate are barely be affected, while the false negative and true negative rates are influenced by 2% to 4%. Besides evaluating how the number of selected features affects the training efficiency and model performance, we also leveraged a test upon how data pre-processing procedures affect the training procedure and predicting result. Normalizing and max–min scaling is the most commonly seen data pre-procedure performed before PCA, since the measure units of features are varied, and it is said that it could increase the training efficiency afterward.

We leveraged another test on adding pre-procedures before extracting 20 principal components from the original dataset and make the comparison in the aspects of time elapse of training stage and prediction precision. However, the test results lead to different conclusions. In Table 6 we can conclude that feature pre-processing does not have a significant impact on training efficiency, but it does influence the model prediction accuracy. Moreover, the first confusion matrix in Fig. 8 indicates that without any feature pre-processing procedure, the false-negative rate and true negative rate are severely affected, while the true positive rate and false positive rate are not affected. If it performs the normalization before PCA, both true positive rate and true negative rate are decreasing by approximately 10%. This test also proved that the best feature pre-processing method for our feature set is exploiting the max–min scale.

### Discussion

In this section, we discuss and compare the results of our proposed model, other approaches, and the most related works.

### Comparison with related works

From the previous works, we found the most commonly exploited models for short-term stock market price trend prediction are support vector machine (SVM), multilayer perceptron artificial neural network (MLP), Naive Bayes classifier (NB), random forest classifier (RAF) and logistic regression classifier (LR). The test case of comparison is also bi-weekly price trend prediction, to evaluate the best result of all models, we keep all 29 features selected by the RFE algorithm. For MLP evaluation, to test if the number of hidden layers would affect the metric scores, we noted layer number as *n* and tested *n *= {1, 3, 5}, 150 training epochs for all the tests, found slight differences in the model performance, which indicates that the variable of MLP layer number hardly affects the metric scores.

From the confusion matrices in Fig. 9, we can see all the machine learning models perform well when training with the full feature set we selected by RFE. From the perspective of training time, training the NB model got the best efficiency. LR algorithm cost less training time than other algorithms while it can achieve a similar prediction result with other costly models such as SVM and MLP. RAF algorithm achieved a relatively high true-positive rate while the poor performance in predicting negative labels. For our proposed LSTM model, it achieves a binary accuracy of 93.25%, which is a significantly high precision of predicting the bi-weekly price trend. We also pre-processed data through PCA and got five principal components, then trained for 150 epochs. The learning curve of our proposed solution, based on feature engineering and the LSTM model, is illustrated in Fig. 10. The confusion matrix is the figure on the right in Fig. 11, and detailed metrics scores can be found in Table 9.

The detailed evaluate results are recorded in Table 7. We will also initiate a discussion upon the evaluation result in the next section.

Because the resulting structure of our proposed solution is different from most of the related works, it would be difficult to make naïve comparison with previous works. For example, it is hard to find the exact accuracy number of price trend prediction in most of the related works since the authors prefer to show the gain rate of simulated investment. Gain rate is a processed number based on simulated investment tests, sometimes one correct investment decision with a large trading volume can achieve a high gain rate regardless of the price trend prediction accuracy. Besides, it is also a unique and heuristic innovation in our proposed solution, we transform the problem of predicting an exact price straight forward to two sequential problems, i.e., predicting the price trend first, focus on building an accurate binary classification model, construct a solid foundation for predicting the exact price change in future works. Besides the different result structure, the datasets that previous works researched on are also different from our work. Some of the previous works involve news data to perform sentiment analysis and exploit the SE part as another system component to support their prediction model.

The latest related work that can compare is Zubair et al. [47], the authors take multiple r-square for model accuracy measurement. Multiple r-square is also called the coefficient of determination, and it shows the strength of predictor variables explaining the variation in stock return [28]. They used three datasets (KSE 100 Index, Lucky Cement Stock, Engro Fertilizer Limited) to evaluate the proposed multiple regression model and achieved 95%, 89%, and 97%, respectively. Except for the KSE 100 Index, the dataset choice in this related work is individual stocks; thus, we choose the evaluation result of the first dataset of their proposed model.

We listed the leading stock price trend prediction model performance in Table 8, from the comparable metrics, the metric scores of our proposed solution are generally better than other related works. Instead of concluding arbitrarily that our proposed model outperformed other models in related works, we first look into the dataset column of Table 8. By looking into the dataset used by each work [18], only trained and tested their proposed solution on three individual stocks, which is difficult to prove the generalization of their proposed model. Ayo [2] leveraged analysis on the stock data from the New York Stock Exchange (NYSE), while the weakness is they only performed analysis on closing price, which is a feature embedded with high noise. Zubair et al. [47] trained their proposed model on both individual stocks and index price, but as we have mentioned in the previous section, index price only consists of the limited number of features and stock IDs, which will further affect the model training quality. For our proposed solution, we collected sufficient data from the Chinese stock market, and applied FE + RFE algorithm on the original indices to get more effective features, the comprehensive evaluation result of 3558 stock IDs can reasonably explain the generalization and effectiveness of our proposed solution in Chinese stock market. However, the authors of Khaidem and Dey [18] and Ayo [2] chose to analyze the stock market in the United States, Zubair et al. [47] performed analysis on Pakistani stock market price, and we obtained the dataset from Chinese stock market, the policies of different countries might impact the model performance, which needs further research to validate.

### Proposed model evaluation—PCA effectiveness

Besides comparing the performance across popular machine learning models, we also evaluated how the PCA algorithm optimizes the training procedure of the proposed LSTM model. We recorded the confusion matrices comparison between training the model by 29 features and by five principal components in Fig. 11. The model training using the full 29 features takes 28.5 s per epoch on average. While it only takes 18 s on average per epoch training on the feature set of five principal components. PCA has significantly improved the training efficiency of the LSTM model by 36.8%. The detailed metrics data are listed in Table 9. We will leverage a discussion in the next section about complexity analysis.

### Complexity analysis of proposed solution

This section analyzes the complexity of our proposed solution. The Long Short-term Memory is different from other NNs, and it is a variant of standard RNN, which also has time steps with memory and gate architecture. In the previous work [46], the author performed an analysis of the RNN architecture complexity. They introduced a method to regard RNN as a directed acyclic graph and proposed a concept of recurrent depth, which helps perform the analysis on the intricacy of RNN.

The recurrent depth is a positive rational number, and we denote it as \(d_{rc}\). As the growth of \(n\)\(d_{rc}\)

## Amazon.com, Inc.

### About Amazon.com, Inc.

Amazon.com, Inc. offers a range of products and services through its Websites. The Company's products include merchandise and content that it purchases for resale from vendors and those offered by third-party sellers. It also manufactures and sells electronic devices, including Kindle, Fire tablet, Fire TV, Echo, Ring, and other devices and it develops and produce media content. It operates through three segments: North America, International and Amazon Web Services (AWS). AWS offers a set of technology services, including compute, storage, database, analytics, machine learning, Internet of Things, cloud and serverless computing. In addition, it provides services, such as advertising to sellers, vendors, publishers, authors, and others, through programs such as sponsored ads, display, and video advertising. It also offers Amazon Prime, a membership program that includes free shipping, access to streaming of various movies and television (TV) episodes, including Amazon Original content.

Industry

Retail (Catalog & Mail Order)

Executive Leadership

Jeffrey P. Bezos

Chairman of the Board

Andrew R. Jassy

President, Chief Executive Officer, Director

Brian T. Olsavsky

Chief Financial Officer, Senior Vice President

David H. Clark

Chief Executive Office, Worldwide Consumer

David A. Zapolsky

Senior Vice President, General Counsel, Secretary

### Key Stats

1.66 mean rating - 56 analysts

##### Revenue (MM, USD)

##### EPS (USD)

Price To Earnings (TTM) | 66.28 |
---|---|

Price To Sales (TTM) | 3.75 |

Price To Book (MRQ) | 14.14 |

Price To Cash Flow (TTM) | 29.46 |

Total Debt To Equity (MRQ) | 55.46 |

LT Debt To Equity (MRQ) | 54.63 |

Return on Investment (TTM) | 11.96 |

Return on Equity (TTM) | 7.90 |

## Stock Price Prediction Based on Information Entropy and Artificial Neural Network

**Abstract:** Stock market is one of the most important components of the financial system. It directs money from investors to support the activity and development of the associated company. Therefore, understanding and modeling the stock price dynamics become critically important, in terms of financial system stability, investment strategy, and market risk control. To better model the temporal dynamics of stock price, we propose a combined machine learning framework with information theory and Artificial Neural Network (ANN). This method creatively uses information entropy to inform non-linear causality as well as stock relevance and uses it to facilitate the ANN time series modeling. Our analysis with Google, Amazon, Facebook, and Apple stock prices demonstrates the feasibility of this machine learning framework.

**Published in: **2019 5th International Conference on Information Management (ICIM)

**Article #: **

**Date of Conference: ** 24-27 March 2019

**Date Added to IEEE Xplore: ** 16 May 2019

**ISBN Information:**

**Electronic ISBN:** 978-1-7281-3430-7

**USB ISBN:** 978-1-7281-3429-1

**Print on Demand(PoD) ISBN:** 978-1-7281-3431-4

**Amazon Stock Price Prediction Forecast: How Much Will AMZN Be Worth In 2021 And Beyond?**

**On the hunt for AMZN prediction 2021 - 2025? Or have you been struggling to answer the question ‘is Amazon a good investment?’ Read on. Find out the top Amazon stock Price prediction forecast for 2021 and beyond and discover how much AMZN could be worth in 2021-2025. **

With total revenue of $280.522 billion in 2019, Amazon is the ninth biggest company in the world. This e-commerce giant has revolutionized the way we shop and made its founder, Jeff Bezos, the richest person in the world. We probably all buy products from Amazon — but should you consider buying shares in Amazon (AMZN) itself?

Whether you’re looking to invest in a stock for the first time or want to diversify an existing portfolio, Amazon has famously brought its investors an impressive return on investment (ROI) in the past. According to analysts, people who invested just $1,000 in Amazon 10 years ago would have $17,727 today, making it an attractive prospect for many would-be investors.

If you want to buy Amazon shares quickly and easily, with **0%** Commission, check out eToro Exchange!

If you’re wondering ‘is Amazon a good investment for 2021’, read on. We’ve put together a selection of Amazon stock price predictions for 2021 and beyond to help you decide whether it’s a smart addition to your portfolio.

**Short-Term Amazon stock Price Prediction for 2021: W**hat do the analysts say?

### AMZN price prediction for 2021

What are stock experts forecasting for Amazon in 2021?

With 2021 just around the corner, let’s look ahead to see what financial analysts believe could happen to the stock price of Amazon next year. At the time of writing, Amazon’s stock price is $3,201.65. The platform WalletInvestor believes that 2021 will see Amazon hit new heights, after its unprecedented boom in Q3 and 4 of 2020:

After rocketing above $2500 for the first time in June 2020, this prediction believes Amazon will open in 2021 at a value of $3098.310. From here, the price will continue to rise, exceeding its previous all-time high (ATH) of $3531.45 to close the year at $3629.370.

WalletInvestor doesn’t predict that Amazon will fall lower than a minimum price of $3098.310 for the entirety of 2021, which should be welcome news to anyone who’s hoping to answer the question ‘will the stock price of Amazon go up?’

According to its Amazon stock outlook, the stock’s price will rise up to $3,800 in 12 months.

This prediction may seem bullish, but if anything, it could be too conservative. According to stock price forecasts from the analysts at CNN Money, we could see the price of Amazon soar to over $4,000 before the end of 2021. They have predicted a potential high of $4,500, which would represent a staggering growth rate of +40.5%.

At a minimum, they don’t expect the stock price to fall below $3,048. Although this is lower than its current price of $3,201.65, it’s not an alarming prospect for investors. This is especially true because 2020 has been a breakthrough year for the retail site. With the boosted revenue from the worldwide coronavirus pandemic sending the stock price to its ATH, it wouldn’t necessarily be surprising if Amazon traded a little lower in 2021.

That said, CNN Money expects Amazon to clock in at an impressive average stock price of $3,800. We can see this mapped on the graph below:

The highest analysts’ target for Amazon share price prediction is $4,500 and the lowest one is $3,048. 42 analysts think Amazon stock is a buy in 2021.

Based on this forecast, CNN Money reports that 42 polled investment analysts recommend buying stock in Amazon.com Inc. (This is out of a possible 49 analysts). Their rating has stayed the same since November 2020, when Amazon underwent another bullish run in the lead up to Christmas.

Our final short-term prediction for the stock price of Amazon in 2021 comes from Gov Capital. This platform has given a similar forecast to that of CNN Money. According to its analysis, Gov Capital believes that the AMZN stock price could reach an average of $4235.997909 in one year’s time. If this projection is correct, then people who invest $100 today could expect their investment to be worth $132.131 by December 2021. (Gov Capital uses an in-house deep learning algorithm to make its predictions, which takes account of factors such as market cycles, similar stocks, and volume changes).

Gov Capital’s average prices are bullish enough, but for investors wondering ‘should I add AMZN to my portfolio’, its maximum predictions are even more exciting. Its algorithm has given a maximum price of over $4,966 — edging close to $5,000. If this proves to be correct, then we could be seeing an increase of almost $2,000 in just one year. This would be a growth rate of over 66%.

Have you considered investing in **Amazon** stock?

**Long-Term AMZN Price Predictions: 2022-2025**

**What are stock experts forecasting for Amazon stock price in 2022 and beyond?**

Now that we’ve taken a look at the short-term price predictions for the stock price of Amazon, let’s look ahead to 2022-2025. If you’re a current investor who’s wondering whether you should buy, sell, or hold, it’s important to explore the long-term prospects of your stock before making your decision. This will potentially help you avoid losing out on any subsequent price changes that occur.

For our first long-term AMZN price prediction, we’ve looked to Long Forecast. This platform has listed the following price predictions for 2022, ranging from a minimum potential value of $3,550 in July and a maximum potential value of $4,765 in November 2022.

Month | Min | Max | Close | Mo,% | Total,% |
---|---|---|---|---|---|

Jan 2022 | $3741 | $4219 | $3980 | 2.6% | 23.6% |

Feb 2022 | $3761 | $4241 | $4001 | 0.5% | 24.3% |

Mar 2022 | $3882 | $4378 | $4130 | 3.2% | 28.3% |

Apr 2022 | $3689 | $4159 | $3924 | -5.0% | 21.9% |

May 2022 | $3560 | $4014 | $3787 | -3.5% | 17.6% |

Jun 2022 | $3737 | $4215 | $3976 | 5.0% | 23.5% |

Jul 2022 | $3550 | $4004 | $3777 | -5.0% | 17.3% |

Aug 2022 | $3728 | $4204 | $3966 | 5.0% | 23.2% |

Sep 2022 | $3914 | $4414 | $4164 | 5.0% | 29.3% |

Oct 2022 | $4024 | $4538 | $4281 | 2.8% | 32.9% |

Nov 2022 | $4225 | $4765 | $4495 | 5.0% | 39.6% |

Dec 2022 | $4014 | $4526 | $4270 | -5.0% | 32.6% |

Although even its maximum 2022 prices aren’t as high as some of the values we saw from Gov Capital and CNN Money, this is still good news for both existing and potential investors. Its minimum prices are consistently above the current AMZN stock price, so even though this is two years in the future, it suggests we are very unlikely to see a significant crash. (Considering the strength of the Amazon empire, this would certainly be surprising).

So, how do the other long-term AMZN price predictions compare? According to a report published by the investment bank Needham & Company, the stock price of Amazon is likely to soar to $5,000 in the long-term, potentially within the next three-to-five years. The report gave Amazon a strong buy rating and cited various factors that could facilitate this growth, including the effects of the coronavirus pandemic and the growing demand for its Amazon Web Services (AWS), and its streaming offerings such as Prime and Twitch.

Needham’s long-term bullish scenario corresponds to the predictions from WalletInvestor. Returning to this platform to explore its forecasts for Amazon in 2022-2025, it’s clear it also believes that $5,000 is an achievable target. By June 2024, WalletInvestor expects AMZN to be trading at $5,048.120 — and it predicts that it will remain above $5,000 for the duration of 2025.

This table shows the opening, closing, minimum, and maximum prices for Amazon in the months of 2025. By the end of the year, WalletInvestor predicts that its stock price will potentially be as high as $5,781.510. Given Amazon’s growth rate between 2021 and 2025, this figure suggests it could possibly hit $6,000 by 2026. If this is the case, it means the stock price of Amazon will double in just six years!

**Read More: What Stocks To Buy In 2021?**

**How Has The Stock Price Of Amazon Changed Over Time? **

Amazon’s growth rate over the last few years has been incredibly strong. But how has its stock price changed since Amazon entered the market back in 1998?

Jeff Bezos founded Amazon in Washington in 1994. Originally an online marketplace for books, the company has since expanded to become of the United States’ ‘Big Five’ IT businesses, alongside Google, Apple, Microsoft, and Facebook. When it started trading in 1998, its average stock price was $15.6647, opening the year at just $4.9583 and closing at $53.550. This translated to a percentage change of 966.56% — a figure that remains its most impressive growth rate so far.

Between 1998 and 2008, however, Amazon’s stock price was fairly bearish. (It even slipped back down to $5.970 in 2001). But from this point on, it’s gone from strength to strength. 2009 proved a breakout year, with the price of AMZN rising from $87.2811 to $134.5200 — and once it surpassed $100, it never looked back.

Every year has proved more profitable than the last, with both Amazon’s revenue and its stock price consistently rising to unseen heights. Take a look at the table below to see how the stock price of Amazon has changed from 2010 to 2020:

From 2016 to 2018, Amazon’s average stock price rose from $699.5231 to $1641.7262. This was an increase of almost $1,000 in just two years — a growth rate that many of the predictions included in this article believe we could see again soon.

Ready to invest in **Amazon** stock?

**What Factors Affect The Stock Price Of Amazon? **

If you haven’t yet invested in AMZN and are wondering ‘is Amazon stock a good buy right now?’, we recommend taking some time to understand the different factors that can affect its price. This will help you to predict the right time to buy. If you’re an existing investor, it can help you decide whether to hold or sell.

Below, we’ve outlined some of the most important factors that can determine whether the stock price of Amazon will go up or down.

**The State Of The E-Commerce Market**

The e-commerce market is one of the fastest-growing sectors in the world. According to Statista, worldwide e-commerce sales are expected to hit $6.54 trillion by 2022. A growing percentage of our internet activity is taken up by online shopping and Amazon, as the world’s biggest brand, is a key company driving this trend.

The coronavirus pandemic has been paramount to the recent e-commerce boom. In Q3 of 2020, Amazon announced that its revenue was $96.1 billion — and it expects to earn between $112 and $121 in Q4. This is because global lockdowns have pushed unprecedented numbers of people to buy everything from groceries to Christmas gifts online. As Amazon sells almost everything, it’s been able to cash in on this market boom. We can see this on the table above. Its growth rate over 2020 has been sensational, opening the year at just over $1,898 and closing it at a projected $3,201.6500.

Analysts believe that our reliance on online shopping is unlikely to dwindle once the coronavirus pandemic is over. They believe the convenience of companies such as Amazon will cause people to continue ordering their products on the internet. This will almost certainly drive up Amazon’s revenue, and consequently its stock price.

By monitoring e-commerce trends, you can find the best times to buy or sell. (For example, Amazon’s stock price almost always rises towards the end of the year, as Christmas shoppers flock to its site).

**Demand For Additional Amazon Services**

As we all know, Amazon doesn’t just sell products. Its one-day delivery subscription service, Amazon Prime, first launched in the United States in 2005. Two years later, in 2007, it launched in the United Kingdom, Germany, and Japan. In both of these years, the stock price of Amazon rose significantly. Between the end of 2005 and 2009, it increased from $47.1500 to $134.5200 as the demand for the service grew.

Amazon Web Services, which was launched back in 2002, had a similar effect on the company’s stock price. At the start of the year, it was valued at $16.4817. By the end of 2003, this had shot up to $52.6200, showing what an impact new releases can have.

**Investing In New Verticals**

Just as adding new services to the Amazon offering can boost its stock price, so too could its willingness to invest in new verticals. Analysts have identified two new markets that Amazon may enter during the next few years: the sale of marijuana products (once fully legalized) and self-driving cars.

It’s predicted that Amazon will add marijuana products to its Whole Foods stores and start using self-driving vehicles for deliveries. The increased custom that could stem from these decisions is likely to drum up its stock price — especially as both moves will be well-covered in global media.

**eToro – Top Stock Broker, Buy Amazon stock with ****0% Commission**

**eToro have proven themselves trustworthy within the stock market over many years – we recommend you try them out.**

*Your capital is at risk. Other fees may apply*

**Key Points**

- With total revenue of $280.522 billion in 2019, Amazon is the ninth biggest company in the world
- WalletInvestor believes we’ll see Amazon exceed its previous all-time high (ATH) of $3531.45 to close 2021 at $3629.370
- According to stock price forecasts from the analysts at CNN Money, we could see the price of Amazon soar to over $4,000 before the end of 2021
- 42 polled investment analysts recommend buying stock in Amazon.com Inc
- Gov Capital believes that the AMZN stock price could reach an average of $4235.997909 in one year’s time
- LongForecast believes the price of Amazon will range from a minimum potential value of $3,550 in July 2022 and a maximum potential value of $4,765 in November 2022
- Needham & Company believes the stock price of Amazon is likely to soar to $5,000 in the long-term
- By the end of 2025, WalletInvestor predicts Amazon’s stock price will potentially be as high as $5,781.510
- Key factors that affect the stock price of Amazon include the state of the e-commerce market, the demand for additional Amazon services, and its willingness to invest in new markets

If you’ve been looking for Amazon stock price prediction in 2021 and beyond, we hope you’ve found this article helpful. Stock investing is notoriously risky and it’s vital to have a robust risk management strategy in place if you’re considering adding AMZN to your portfolio.

If you liked our article **How Much Will Amazon Stock Be Worth In 2021 And Beyond**, please share with your fellow traders.

**FAQs**

**Is Amazon a good buy?**

According to the short and long-term predictions, Amazon is potentially a good buy. Most analysts agree that it’s not a stock to buy in order to make a quick profit. Investing in AMZN can be a great long-term position as the price is universally expected to increase in the future. Selling too early could cause serious investor regret.

**Is Amazon stock overvalued? **

No, according to most analysis, Amazon stock has not been overvalued. Its current price is $3,201.65, with many analysts claiming that it’s trading at a discount.

**Will Amazon pay a dividend? **

Whether you’re an existing or prospective investor, it’s worth noting that Amazon has never yet paid a dividend.

**Is now a good time to invest in Amazon? **

The general consensus is that now is a good time to invest in Amazon. The growth predicted for the next few years, partly as a result of the coronavirus pandemic, means that investors could potentially benefit from soaring stock prices in the future.