Creating a trading bot using machine learning in time series analysis

This is not a technical article; there is no detailed analysis of methods and theory. It’s just that somehow I got carried away with machine learning and, like many people who are new to this topic, I decided to make a trading bot. However, this has grown into more than just a training project. That's what I want to tell about all this.

A Little About Machine Learning


Machine learning (Machine Learning; hereinafter referred to as MO), in one way or another, is an integral part of the industry of Artificial Intelligence (Artificial Intelligence; hereinafter AI), science and technology, which allows “intelligent” computer systems to simulate human behavior. This industry also includes the concept of Deep Learning, which affects neural networks and imitation of human thinking.

The academic discipline of AI is studied as a machine, i.e. computer, solve problems that are subject only to the human mind. This can be a task such as understanding the text read, or determining the moves in a game of checkers, or solving a puzzle. MO is to develop algorithms that help the computer draw conclusions based on the information received. Fuel to everything in this case is data.

image
(c) Oracle

In the information age, information and data are the most valuable. Everything we do, offline or online, generates new data: whether it is text, audio, video data, sensory measurements, “smart” gadgets and the exchange of data between them. Opportunities for collecting large and new data are rapidly increasing due to the availability of hardware devices, the development of cloud infrastructure and, due to the massive use of these technologies, a drop in their cost. Knowledge, or information, is power - the axiom is true but there is a “but."

The possession of huge volumes of information does not yet benefit its holder. But it is through the algorithms and methods of MO that Big Data begins to make sense and benefit. More specifically, MO systems notice patterns and build conclusions based on many factors in the data system, without being programmed for it.

The main applications of MO today are object recognition, computer vision, data analysis, quality control (monitoring) and predictive analytics.

So, we know that machine learning relies on data processing algorithms and the data itself. About MO methods, you can delve into a separate series of articles. Describing them would be comparable to listing possible technologies in web development and their application if each of these technologies has its advantages and disadvantages.

It should only be noted that over time, the algorithms and methods of data processing improve, and the data becomes more and more, thereby the quality of data processing becomes better.

Of the data types, I would like to pay attention to the time series on the example of a personal project - automated cryptocurrency trading.

Time Series Analysis


A time series is a type of data that can be represented as sequences of measurements ordered at nonrandom time instants.

There are two main goals of time series analysis:
determining the nature of a series and forecasting (predicting future values ​​of a time series from present and past values). This involves the identification and description of the series model, and data interpretation, which allows us to derive the future value of the series.

Using MO methods for such data, we are able to find deeper patterns in the data. As a result, we get a more “intelligent” forecast of future values.

Unlike the analysis of random data samples, the analysis of time series is based on the assumption that sequential values ​​in the data file are observed at regular intervals. In other methods, we are not important and often not interested in linking observations to time.

In other words, for the analysis it is important that the event X occurred at time Y. For example, the exchange rate, we are important in the selection, where the price is tied to a specific point in time. If you disrupt the order of dates in a series of date-price, then it will become meaningless.

For example, using financial time series, the MO algorithm can make a forecast about growth or decline in profitability. When processing audio files, where time series will be expressed by changes in tone in relation to timing, you can analyze the meaning of the speech. Given meteorological data, complex weather forecasts can be derived through the MO and time series.

If the algorithms of the MO analyze, say, the images of seals, then we don’t care about the timing, or even the order in which the pictures arrive.

Using cryptocurrencies as an example


In my project, I tried to answer the question of whether it is possible to build a fully automatic trading system based on machine learning methods. To do this, I found and collected historical data on bitcoin prices, transaction volumes, as well as placed and withdrawn orders.

After some time, through trial and error, I came to a certain understanding of how this data should be interpreted, which neural network architecture to use, how to label data, etc. In particular, training is conducted with a granularity of 10 seconds, and the price in the future is used as the resulting value.

Now the algorithm uses several models trained at different times, because I constantly improved the learning algorithm and added new collected data to it. For implementation, the Python programming language was used with the libraries Keras, Scipy, Pandas.

The script for executing trading operations places orders on the Bitmex trading platform around the clock. When a signal appears to open a position and complete the relevant transaction, the script will wait until the price reaches the Take Profit or Stop Loss levels, or until the transaction expires (Time To Live).


The main log file that displays real-time data processing

The data processing algorithm relies on technical data analysis, trading history, previous orders, orders, and news related to cryptocurrency (through natural language processing, or Natural Language Processing - NLP).

The main success metric is accuracy based on the number of Take profit orders in relation to the total number of orders. In total, the forecast is considered successful when the Take Profit order is reached, while Stop Loss and Time to Live are recognized by the model as unsuccessful.

Accuracy = (Number of orders of type Take profit) / (Total number of orders)

Bargaining is considered profitable when accuracy of 67% is achieved.
MonthAccuracy
January 202072%
February 202070%
March 202060%
April 202070%

Below are visually presented transactions made using such a software solution.


Green triangles are successful transactions (Take profit achieved), red triangles are unsuccessful (Stop loss, Time to Live). Triangles looking up are buying deals, triangles looking down are selling deals.

In custody


Automated trading is just one of the most obvious possible applications of time series analysis. If we talk about business, then forecasting different indicators based on the data collected can be critically important. Based on such forecasts, important business decisions can now be made, and in the future the volume of automatically made decisions will only grow.

Performing such calculations is a fairly resource-intensive process. Fortunately, the computing power of computer systems is constantly increasing. Moreover, the modern branch of computer science that studies AI is aimed at creating algorithms that most effectively use available computing resources to identify patterns in the accumulated data.

Derivation and construction of forecasts is impossible without the use of time series. It is this kind of data that underlies forecasts that help executives make decisions that are vital to business. Undoubtedly, the analysis of time series and their processing by the MO algorithms is an integral part of the business processes of the future.

All Articles