Machine Learning In Stock Trading: An Easy Introduction

Its holiday season. The wifey is out and what better to do than invest time in reviewing some basic application of machine learning applied to the field of Finance. This is a post I have been wanting to write for a long time. All of the Code has been written in R and is easily reproducible. I will not share it here as I dont know how to do it best. Nonetheless the most important packages for achieving this level of black magic are:

  1.  Caret
  2.  Forecast
  3.  Kenrlab
  4.  Neuralnet
  5.  Xgboost
  6.  tseries (sub to PewDiePie)

I will not be reviewing any of the statistical concepts applied but will merely focus on their application and see if we can find any viable/useful results. I feel I should not be saying this as everyon here is a mature adult. BUT this is no financial advice and you should always be doing you due diligence when investing any of your money, not taking advice from a random stranger in the Internet. And bear in mind just because a strategy has worked in the past does not mean it will work in the future.

 

robot

 

 We will be taking a closer look at the share price of the german football behemoth from the Ruhrpott who has been the eternal second after Bayern Munich, Borussia Dortmund.  As can be decuded the share price is highly related to the clubs sporting. So we shouldn’t be expecting too many conclusive results. The stock has also been known to attract some rather peculiar stories as someone bombing the team bus, hoping to hurt members of the staff to financially benefit from it. The time-seris can be seen below. As can be seen the share price has profited from some healthy growth reahcing an all-time high this year.

 

 

But how will proceed? Here a brief overview:

First we will be looking at some feature selection methods such as:

  1. Filter Methods
  2. Wrapper Methods
  3. Embedded Methods

Second we will consider multiple machine learning methods such as:

  1. Extreme Gradient Boosting Machine (XGB)
  2. Support Vector Machine (SVM)
  3. Artifical Neural Networks (ANN)

Third, we will conisder all results and compare them.

After this brief introduction we will finally get our hands dirty. First we will start by simply looking at some feature selection methods. At this point you might be thinking to yourself: “AWWW HELL NAH. JUST SHOW ME HOW TO DO THE THANG” let me tell you that feature selection is one of the most important factors when applying machine learning, so I will briefly run through it. This method consists of simply choosing input predictors. This has multiple advantages such as easy model interpretation, faster learning time, reduced dimensionality and reduced over-fitting. The principal techniques for feature selection are filter, wrapper and ensemble methods.

 

steveharvey

 

Filter methods consist of selecting input predictors based on certain statistical criteria before using them in a learning algorithm.

So first we can try to regress the returns of the share price on themselves using linear regression. Then we can trim down the input predictors down using the p-values. So we lagged the returns up to lag 9 (for no particular reason). Only lag 4 is statistically significant at the 5% level. Nonetheless we will also be looking at lag 3 as it still i statistically significant at level 10%. We have an adjusted r-squared value of 0.006 which is a rather poor result. Hence we will be eliminating all the input predictors unless the aforementioned ones.

 

 

Re-running the linear regression only using lag 3 and lag 4, we get the summary as can be seen below. As we can see re-running the test with less input predictors yields quite different results. The estimates for the single predictors are now different; furthermore, both of the estimates have become more statistically significant. Additionally the Adjusted R-squared has increased, even though still painfully low.

 

 

So we can advance to further ways of selecting our ideal predictor variables. So we can try to find our input predictors wrapper methods. The main advantage of wrappers compared to filter methods is considering interaction with output target features. The downsides of wrapper methods is obviously the increased computational power needed and the risk of over-fitting. Some of the most common Wrapper Methods are:

  • Forward selection
  • Backward elimination
  • Recursive Feature elimination

I personally use backward elimination, where we start with all the features and removes the least significant feature at each iteration which improves the performance of the model. We repeat this until no improvement is observed on removal of features.

 

So using the Caret package we can run this rather simply in R. By the result we can see that the ideal model is composed of four variables, which is able to minimize the mean absolute error (MAE). The lags chosen by the recursive feature selection (rfe) are 4, 5, 1, 7. The results from this little more advanced feature selection is obviously already very different from the results achieved by our simple selection method.

Last but not least we will be looking at embedded methods. These consist of selecting input predictors while using them in learning algorithms and simultaneously maximizing model performance. Embedded methods combine the qualities’ of filter and wrapper methods. It’s implemented by algorithms that have their own built-in feature selection methods. Some of the most popular examples of these methods are LASSO and RIDGE regression which have inbuilt penalization functions to reduce overfitting. I will be using the Lasso regression which performs L1 regularization which adds penalty equivalent to absolute value of the magnitude of coefficients.

Using the Lasso we get a similar result to what we had when we just used the simple linear regression model.

Now that we are done with the feature selection, we can advance to the more juicy stuff.

Going forward we will be using the 4 input predictors we obtained using the recursive feature selection. So this means that we will be using lags 1, 4, 5, 7. Furthermore we will try to run our models via pre-processing our data by using principal component analysis.

So the first machine learning tool we will be using is an Extreme Gradient Boosting (XGB), which is a very commo algorithm (seems to be the favourite from the Kaggle Nerds, joking please don’t boot my nerds). This algorithm is great for supervised learning tasks such as Regression, Classification, and Ranking. EGB has the following parameters.

  1. Tree boosting algorithm: it predicts output target feature of weighted sequentially built decision trees
  2. Algorithm optimization: it finds local optimal weight coefficients of sequentially built decision trees. For regression, gradient descent algorithm is used for locally minimizing regularized sum of squared errors function, among others.

Running the XGB in R, this is the output we get. So we notice that the number of rounds which minimized our RMSE was 50 and the max tree depth is 1. This can furthermore be observed when looking at the bottom right window, with eta 0.3 and subsample 1. What we could also try (and which I actually did) is to see whether feature extraction via PCA could improve our results.

Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components. This transformation is defined in such a way that the first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components.

So quick check if our input predictors are in any shape or form correlated. As we can see none of the input predictors are very much correlated to each other. The most prominent correlation we can observe are at lag 5, which as a positive correlation of 0.08. In this environment using PCA makes not a lot of sense but keep in mind it is a viable tool in a highly correlated environment, such as when checking for interest rate products. Using the PCA pre-processing we finally arrive at an RMSE of 0.02720604 which is actually slightly better than the 0.02723055 RMSE achieved by selection features. Obviously in real-life you would still opt to having as little as possible input predictors. Nonetheless as the results are better and my computer is not suffering too much under the additional computational requirements we will move forward using PCA.

So moving forward, we now will visualize how our residuals behave and try to make sense of our results. Below we can see how our model behaves with respect to the actual time-series of the returns from the share price. The black line represents the actual returns whereas the red line is the estimates from our XGB. You now might be thinking to yourself: “Well this is quite underwhelming…”.

trap

Nonetheless before you rage-quit, keep in mind we want the model to give us directional predictions and not an exact estimate of exactly how much the share price will move. This is exactly what it does, we do not expect the model to reproduce the exact moves of the share, as this would simply mean that the time-series is overfit.

Taking a closer look at the results achieved by the PCA we can see that the results are very similar. Nonetheless the results achieved by pre-processing are much more prominent (simply take my word for this) .

We will now us the model to predict give us a signal and change our position in the stock. We will only consider long or no position, as shorting stocks is just a whole different story.

So when running this all we get the following table. The first column “xgbmret” describes simply the returns generated by the model. The second column “xgbmretc” describes the returns generated by the model adjusted for commissions. The commissions were calculated at 10 bps per trade, which actually cuts of fair share or annualized returns. The last column “rbr” simply is the returns generated by a long position. As we can see the model can clearly outperform a simple buy and hold position as it is able to generate higher annualized returns at a much lower risk. Nonetheless the picture changes a little when considering commissions, which place a heavy toll on the performance. When considering any sort of commission, the annualized returns drop by a total of 7 (!!!!!!)percent points. Algebraically it also makes sense that the standard deviation increases. So even though our returns have come down drastically, the Sharpe ratio is still much more performant than a simple buy and hold position.

Furthermore we can check the equity curve to see how the time-series evolved over time. This is useful information as it will help us infer if any of those performances were just lucky at a certain point in time. By looking at the graph we can see that the performance was quite consistent over time. Additionally it allows to infer one of the big advantages of the model, which is the protection against drawdown which the model ensures.

Ok now that we have tested for this model, let us try some other models. I will be proceeding in a similar way but only present you with the results and spare you all the tedious stuff in between.

So next we will be looking at Maximum Margin Methods. These methods consist of supervised boundary based learning algorithms for predicting output target feature by separating output target and input predictor features data into optimal hyper-planes. The most common method for regression learning tasks are support vector machines. Support vector machines are usually used for classification tasks which is obviously our case. Here again we will be using the pre-processed PCA time-series, again because the RMSE is lower. I will spare you most of the previously discussed details. Nonetheless I would like to guide your attention toward this graph below. This graph was already discussed when having a more detailed look at the XGB. What we can see now is that the SVM is more volatile when it comes to projections. So even though it might be more accurate but might cost us more money with respect to commissions we will have to pay.

Here again we can observe what we have observed previously. Nonetheless we can already see that one of the downfalls of the Support Vector Machine is its increased volatility which forces us to change our position many times, forcing us to pay up a lot (hey maybe the broker will send you over some goodies for that). Adjusting for commissions we have to pass on half of our earnings to the broker, which is very hefty. Considering commissions the risk-adjusted performance even becomes worse than the simple Buy and Hold position. This makes clear that it is critical for any signal to be able to blend out random noise. Even though the SVM performed much better than the XGB in an ideal world without commissions, it performed much worse when considering commissions. Now it may be that the commission I am considering is way too high or low, strongly skewing the results. This is the fine balance someone has to consider when setting up any kind of model.

And we can again observe that the trading strategy provides good protection against any sort of downside risk compared to Buy and Hold strategy.

 

Last but not least we will move on to the Artificial Neural Network (ANN), which is part of the Multi-Layer Perceptron Methods. Multi-layer perceptron methods consist of supervised learning algorithms for predicting output target feature by dynamically processing output target and input predictors data through multi-layer network of optimally weighted connections of nodes. The nodes are usually organised in input, hidden and output layers.

In this case we are running the ANN using the features we selected at the beginning. Just to remind you in case of short-term memory loss, these were lag 1, 4, 5, 7. The results, are to say, at the very least very underwhelming. Even though we were able to bring down the standard deviation net or not of commission the returns are just horrendously bad. This obviously does not de-classify the application of ANN, but it just shows that you don’t need the most complicated of machine learning to be able to solve problems related to finance.

So finally we can compare the results of all the algorithms and see which one performed best. When considering the results generated without considering commissions we get see that machine learning algorithms can provide valuable insights. The only machine learning algo which was not able to outperform the Buy and Hold position was the ANN. As previously mentioned the algos are able to provide some existential protection against downside risk. The retuns generated by any of the rules are anyway rather substantial.

ret_no_comm

Now ignoring commissions simply is not wise. So the returns worsen a lot when considering commissions. Nontheles we are also able to bring down Std. Dev. drastically, which is a plus. Nonetheless this is still not enough to have a better Sharpe Ratio than the benchmark.

ret_comm.jpegCapture

Bulls drive Dow Jones to the 24000 mark

Today’s Overview:

  • Dax crashes
  • Dow Jones breaks the 24000 mark
  • CFIUS clears Bayer’s planned takeover of Monsanto
  • Tesla opens biggest Lithium battery factory
  • Juniper goes crashing after Nokia denies take-over

After a long week with no real changes the German DAX seems to be heading down in the first day of December. After opening at 13044, the German Index is down at the12854 point mark, losing 130 bp. The DAX is not the only Index suffering in the first day of December. The euro zone is sliding down Overall on the last day of the week as euro zone factories posted their busiest month in over 17 years in November. At mid-day only 3 stocks are above the 0% line, the biggest loser being Infineon carrying a loss of 228bp. Europe’s financial stocks wilted after a delayed vote on tax reform in the U.S. deflated a rally in the sector, driving regional benchmarks to start December with a dip. Euro zone stocks fell 0.6 percent while Britain’s FTSE, which has suffered from a strong sterling this week, slid 0.1 percent.  Financials were the biggest weight after the U.S. Senate delayed a vote on a tax reform bill that investors anticipate will be beneficial for banks.  Lloyds, Barclays, and BNP Paribas led the index down.  Oil and gas stocks stayed buoyant, with Shell, Total and BP leading sector gains as OPEC’s extension of supply cuts continued to boost crude prices. Healthcare stocks outperformed thanks to a Morgan Stanley upgrade boosting UCB by 3.3 percent while Novo Nordisk, flagged as one of the strategists’ favourites in the pharma space, gained 2.8 percent.  British pharma company Indivior also shot up 11.7 percent after its opioid addiction drug got approved by the U.S. Food and Drug Administration.

In the U.S. the stock market delivered a strong performance yesterday, with traders seeming to be more confident that the Trump Administration’s tax plan will be approved. At the close of trading, the Dow Jones Industrial Average was ahead 332 points; the broader S&P 500 Index was up 21 points; and the NASDAQ was higher by 50 points. Market breadth was supportive, with winners nicely ahead of losers on the NYSE. Most equity sectors participated in today’s advance. The energy and industrial issues showed leadership, while the telecom names and utility stocks underperformed. Technically, the stock market continues to move strongly higher, with the Dow Jones Industrial Average now past the 24,000 mark. While equity valuations are elevated, traders seem content with the nation’s economic progress.

On other big news Germany’s Bayer said that the Committee on Foreign Investment in the United States (CFIUS) had no national security concerns about the drugmaker’s planned takeover of U.S. seeds group Monsanto, giving its go-ahead. Bayer and Monsanto will continue to cooperate with other authorities to complete the transaction in early 2018, Bayer said in a statement on Friday.

Tesla Inc switched on the world’s biggest lithium ion battery on Friday in time to feed Australia’s shaky power grid for the first day of summer, meeting a promise by Elon Musk to build it in 100 days or give it free. “South Australia is now leading the world in dispatchable renewable energy,” state Premier Jay Weatherill said at the official launch at the Hornsdale wind farm, owned by private French firm Neoen.  Tesla won a bid in July to build the 129-megawatt hour battery for South Australia, which expanded in wind power far quicker than the rest of the country, but has suffered a string of blackouts over the past 18 months. In a politically charged debate, opponents of the state’s renewables push have argued that the battery is a “Hollywood solution” in a country that still relies on fossil fuels, mainly coal, for two-thirds of its electricity.  Supporters, however, say it will help stabilize the grid in a state that now gets more than 40 percent of its electricity from wind energy, but needs help when the wind dies down.

Shares of Juniper Networks Inc fell 8 percent in pre-market trade on Thursday after Finland’s Nokia denied reports that it was in talks to buy the U.S. network gear maker.  CNBC on Wednesday reported, citing sources, that Nokia was in talks to buy Juniper at an offer that would value the company at around $16 billion, higher than Juniper’s $11.26 billion market capitalization as of Wednesday’s close. That valuation would imply a price of about $42 per share, a level last seen by Juniper shareholders six years ago, Morningstar analyst Ilya Kundozerov wrote in a client note.  Within hours of the CNBC report, however, Nokia, which does not typically comment on market rumors, said it was not preparing an offer for Juniper. Bernstein analyst Pierre Ferragu said Nokia acquiring Juniper seemed a stretch citing an operational alliance limited to $300 million to $400 million in costs, near impossible product integration in routing and a risk of negative revenue combination.

 

BMW to invest 200 million in battery cell site

Today’s Topics:

  • DAX Overview
  • BASF talks with DEA
  • BMW to invest 200 million
  • German business confidence at all-time high

This morning the DAX opened at the 13’000 mark and rallied up to the 13’150 point mark, noting a increase of 1,04%. This increase is fuelled by BASF and ThyssenKrupp  yielding  2,59% and 1,89% respectively. The BASF stock rose up in reaction to talks between the German chemicals group with DEA the energy group owned by Russian tycoon Mikhail Fridman. Bloomberg, citing people familiar with the matter, said that talks between Wintershall and DEA were at an advanced stage, adding the combined entity could be valued at more than 10 billion euros ($11.9 billion).

BMW will bundle its battery cell expertise in a new competence centre, the German luxury carmaker said on Friday, adding it would invest 200 million euros ($237 million) in the site over the next four years. “By producing battery-cell prototypes, we can analyse and fully understand the cell’s value-creation processes. With this build-to-print expertise, we can enable potential suppliers to produce cells to our specifications,” BMW board member Oliver Zipse said in a statement. “The knowledge we gain is very important to us, regardless of whether we produce the battery cells ourselves, or not.” The centre will open in early 2019, BMW said. ($1 = 0.8435 euros)

German business confidence rose unexpectedly in November to hit an all-time high, a survey showed Friday, adding to signs that Europe’s largest economy was heading for a strong fourth quarter. The Munich-based Ifo economic institute said its business climate index, based on a monthly survey of some 7,000 firms, rose to 117.5 from an upwardly revised reading of 116.8 in October. This was higher than a Reuters consensus forecast for a value of 116.6. “Sentiment among German businesses is very strong,” Ifo chief Clemens Fuest said in a statement. “This was due to far more optimistic business expectations. The German economy is on track for a boom.”

Todays Economical Calendar:

  • PMI Composite Flash

European shares brush off German Governement worries

Today’s Overview:

  • European shares brush off German Governement worries
  • Volkswagen raises mid-term outlook for group profits
  • Wall Street takes off ahead of Thanks-Giving week
  • Marvell M&A
  • EBA moves to Paris

After the failed Jamaica-Coalition talks the German DAX and Gold tumbled down to 12945 and 1276 respectively. The shock was short-lived and the Index move up to the 13030 in the afternoon, finally closing at the 13060 mark. The big movers in the German market were Volkswagen rallying up 2,78% and RWE incurring a loss of 1,58%. Volkswagen raised its mid-term outlook for group profit and sales on Monday, sustaining investor hopes that the carmaker can further its recovery despite shouldering billions of costs for its electric-car offensive. The world’s largest automaker by sales announced on Friday more than 34 billion euros ($40.06 billion) of spending on zero-emission cars and digital mobility services by the end of 2022, revising up an investment pledge for more than 20 billion euros made in September. VW said rebounding emerging markets such as Brazil and Russia and demand for new VW-badged sport-utility vehicles (SUVs) may together lead group revenue to exceed the 2016 record of 217 billion euros by more than a quarter by 2020. On Monday evening German chancellor Angela Merkel said she would prefer fresh elections to ruling with a minority government after talks on forming the three-way coalition collapsed. This resulted the DAX to open on the lower-end and the Euro Sterling to reach an 8-day low.

Wall Street managed to make some progress yesterday, as they commenced a new week. At the close of trading, the Dow Jones Industrial Average was ahead roughly 72 points; the broader S&P 500 Index was up three points; and the technology heavy NASDAQ was higher by nearly eight points. Market breadth was positive, with winners ahead of losers on the NYSE. From a sector perspective, the industrials and the technology issues pressed ahead, while the energy and utility names retreated. Meanwhile, there was just one notable economic report released this morning. Specifically, the Index of Economic Indicators advanced 1.2% in the month of October, which was quite a bit better than had been widely anticipated. Tomorrow, existing homes sales for the month of October are due to be released. Technically, the stock market has been holding up reasonably well lately. Looking ahead, with the year drawing to a close, it remains to be seen if the bulls can find the strength to produce a holiday rally.

In further news the Chipmaker Marvell Technology Group Ltd said on Monday it would buy smaller rival Cavium Inc  for about $6 billion, as it seeks to expand its wireless connectivity business in a rapidly consolidating semiconductor industry. Shares of Marvell were down 0.8 percent to $20.14, while Cavium was up 7 percent at $81.14 in early trading.

The European Union is relocating two of its key agencies from London to Amsterdam and Paris post-Brexit. On Monday, the EU announced that the European Banking Authority (EBA) will be moving to Paris an early sign of the potential costs for the United Kingdom of leaving the political and economic union.

Todays Earnings Calendar:

  • GameStop Corp.
  • Guess? Inc.
  • HP Inc.

Todays Economic Calendar:

  • Chicago Fed National Activity Index
  • Existing Home Sales