Do neural networks dream of electric money?

TL; DR: No



In the vastness of the Web, it is full of materials, manuals, ready-made solutions, assemblies and other stuff dedicated to forecasting the prices of cryptocurrency and traditional exchange assets, smelling of quick and easy incomes with a minimum of effort. And although different people write them, with different approaches, on different platforms and with different paradigms, they all have one unchanging common attribute - they do not work .

Why? Let's get it right.

Introduction


Let's get acquainted, my name is Denis and, in my free time, I do research in the field of artificial intelligence and, in particular, artificial neural networks.


In this article I will try to describe the problems encountered that create yourself novice researchers of artificial neural networks in the pursuit of financial independence, spending precious time with near zero efficiency.

I hope that, within the framework of this article, it will be possible to maintain a sufficient balance between the complexity of the material and the ease of perception, so that the text is moderately simple, understandable and interesting both to people who are not related to this field, and to those who have been engaged in researching problems in this industry. I must say right away that there will be no formulas here, specific terminology is also minimized.

I do not work for Google. I do not have twenty degrees. I did not intern at NASA. I did not study at Stanford, and I bitterly regret it. However, I still hope that I understand what I'm talking about when it comes to forecasting systems and, at the same time, I'm pretty closely connected with the cryptocurrency world in general and the Cardano project in particular.

Of course, I, as a crypto enthusiast engaged in neural networks, simply could not help but get into the foggy field of application of AI regarding cryptocurrencies.


The essence of the problem


As mentioned earlier, there are so many materials that seem to have been worked out and seemingly deep, with examples, on this subject, so much that your eyes run wide. And the authors are so sure that their experiment, unlike the previous few hundred, is successful, that one wonders why the next article does not end with photos with a “lamb” on a personal island, and a list of authors of “kaggle kernels” related to price forecasting to bitcoins, does not duplicate Forbes lists.

It is expected that there are articles on Habré dedicated to these issues. And, interestingly, regardless of the place and language of publication, all these articles end with approximately the same text: “Well, the result is quite good, everything almost works , you just need to tighten some hyperparameters and everything will be fine.”

And, of course, the graphs on which the neural network ideally indicates the price, such as:








.

And, in order not to be unfounded, here are examples of such articles: one , two , three .

How it all started


The idea of ​​predicting new prices at old is far from new. In fact, this applies not only to cryptocurrencies. It just so happened that they are closer to me personally, but the homeland of what is called “technical analysis” is, after all, traditional exchanges. Those are the ones where, according to the films, they are all in expensive costumes, but at the same time they scream like girls at a concert of their favorite band.

Trying to see the future according to the past, people invented a huge number of all kinds of tricky oscillators, indicators, signaling devices based on mathematical statistics, probability theory and, at times, frank pareidolia .

Perhaps the most popular is the search for figures. Fifteen minutes of reading the Internet, and even now on Wall Street! It’s so simple - you just need to find “Bart Simpson’s head”, “butterfly”, “flag (not to be confused with the wedge! 11)”, “azure falling in a vacuum turret”, build many, many lines and, quite openly , interpret it to your advantage!


Almost all of these solutions have one small, but very dense and severe drawback - they perfectly capture trends ... after the fact . And if something is declared as not fixing, but predictive, then it is interpreted so freely that ten people, looking at the same chart with the same indicator, will give ten independent forecasts. And, which is characteristic, at least one of them will most likely be right!

But it will also be established after the fact. And the rest will simply say "ah, well, we inattentively read the signals incorrectly."

Do not misunderstand me. It is quite possible that a real Wall Street trader, who has 20 screams and 200 suicide attempts over the years, is likely to superimpose a stack of indicators and oscillators on each other and, like the operator from the movie “The Matrix”, read useful there data flavored with a sufficiently high mat. waiting for a successful transaction. I even admit that specifically you, the reader, also know how. Without a drop of sarcasm, I admit. In the end, for some reason, they are being invented, improved, these indicators ...

Modern problems require modern solutions!


By the year 2015, everyone had already heard neural networks. Rosenblatt did not even imagine how much they would be heard. Thanks to responsible, professional, media-savvy people, mankind has learned that neural networks are the most electronic version of the human brain that can solve any task faster and better, with unlimited potential and in general, here we’ll jump directly into the light through a singularity dark future. Here it’s how lucky.

But there was one “but.” For the time being, neural networks lived only in reserved mathematical packages, in a very very low-level form, supporting mathematicians and scientists with graphs in MatLabs.

But popularization did its job and attracted a lot of attention of developers of various degrees of independence to the industry. These same developers, being, unlike ordinary mathematicians, people endowed with noble laziness, began to look for ways to throw several levels of abstraction on this matter, making life easier for themselves and everyone, showing the world a very convenient and high-quality high-level tools like Keras or FANN. In this zeal, they succeeded so much that they brought work with neural networks to the level of “just once and works”, opening the way to all comers to the world of miracles and magic.

It is miracles and magic, not mathematics and facts.


The birth of a legend


Neural networks have become available, close and easily used for everyone. Seriously, the FANN implementation is even for PHP. Moreover, it is included in the list of basic extensions .

What about Keras? In 10 lines, you can collect a recurrence-convolutional network, without understanding how convolutions work, or how LSTM differs from GRU! Artificial intelligence for everyone and everyone! And let no one go offended!

I think, in part, the terminology played the most cruel joke. What are neural network outputs called? Yeah. Predictions. Predictions. A neural network predicts one data over another. It sounds just like what you need .

Manuals for high-level libraries protect the user from complex terms, matrices, vectors, transformations, differential calculus, mathematical meanings of these gradients, regressions, and regularization losses.

And, most importantly, they protect the romantic image of the “electronic model of the human brain capable of everything” from harsh reality, in which the neural networks are just an approximator, which, roughly speaking, is nothing more than an evolutionary step up a notch from an ordinary linear classifier.

But it doesn’t matter when you assemble your first solver for CIFAR-10 from the listings from the documentation, without making any efforts, without even really understanding what is happening. There is only one thought in mind:


What can I say, what can I say, people are so arranged ...


Here it is, a technological miracle! You just give it some data at the input, other at the output, but it itself finds a connection and learns to predict outputs by inputs. How many problems can be solved! How many tasks can be leveled!

So much to predict ! Interestingly, do other people in general know? With this toolkit, my possibilities are endless! UNLIMITED!


But what if you feed the neural network with candles from the crypto-exchange / stock exchange / forex, giving it a candle from the next time period to exit? She will then learn to predict new values ​​from previous ones! After all, this is what it was made for! A neural network can predict anything, there would be data, and data on the history of quotes is a dime a dozen! Oh, inspiration, only a moment, but so beautiful!

Why not?


Because in a real world that is different from the world created by the media, it doesn’t work like that. Neural networks are not a machine for predictions. Neural networks are approximators. Very good approximators. It is believed that neural networks can approximate almost anything. With only one condition - if it is "something" lends itself to approximation .

And then a novice researcher falls into the hook of cognitive distortion. The first and main mistake is that historical quotation data seems to be more than just statistics. On them, you can draw so many triangles and arrows after the fact that only a blind person when looking at it will not be obvious that this all has a certain logic that simply could not be counted in time. But which the Machine may know.

Looking at statistics , a person sees a function . The trap slams.
What is the second mistake / cognitive bias? But here's the thing.

And it works with the weather!


This is a very frequent argument that I hear in crypto communities, in dialogs about the possibilities of predicting something from historical data using statistical analysis methods. It works with the weather. The essence of the distortion is that "if A works for B, but it seems to me that B is the same as C, then A should work for C as well." A kind of pseudo-transitivity, which rests in an insufficient understanding of the processes that underlie the differences between B and C.

With the same success, we can assume, for example, that the pedals in the cockpit of the aircraft are brake and gas during an automatic transmission, and not a horizontal steering wheel at all. The intuitive perception of some things, unfortunately, is not always correct, because it does not always rely on a fairly complete set of data about the situation / system / object. Hi Bayes! How are you?

Let's get a little deeper into the theory.

Chaos and the Law


It so happened that all processes and events in our reality can be classified into two groups: stochastic and deterministic. As I try hard to avoid dreary terminology, let's replace them with simpler terms: unpredictable and predictable.


As Obi-Wan rightly tells us, it's not so simple. The fact is that, in the real world, not the theoretical one, everything is a little more complicated and completely predictable and completely unpredictable processes simply do not exist. As a maximum, there are quasi-predictable and quasi-unpredictable ones. Well, that is, here are almost unpredictable and almost predictable. Almost almost, but no.

For example, snow falls quasi-predictably from top to bottom. In almost 100% of cases observed. But not in my kitchen window! There it snows from bottom to top due to the characteristics of the air flow and the shape of the house. But not always! Also in almost 100% of cases, but not always. Sometimes in my kitchen window it also falls down. It would seem that such a simple thing, but for the same observer in two different cases, behaves completely differently, and both behaviors are normal and quasi-predictable with almost 100% probability, although they completely contradict each other . Not bad? The quasi-predictable event turned out to be ... quasi-unpredictable? Further more.

At this moment, our friend Bayes begins to laugh. What about unpredictable events? I will not use the prefix "quasi", okay? Everyone already understands that I mean it. So here. Take something completely unpredictable. Brownian motion? A great example of a completely unpredictable system. It is so? Let's ask quantum physicists:


The fact is that, theoretically, even such a complex system as Brownian motion on a real scale, in theory, can model and predict the state of this system at any time in the future or in the past. In theory. About how much calculations, capacities, time and sacrifices for the Dark Gods are necessary for this, we tactfully keep silent.

And a predictable, in the general case, system, which becomes unpredictable if you lower the scale to the level of particular cases, is actually quite predictable if you expand the scope of observation of a particular case to include external factors, obtaining a more complete description of the system in this very particular case.

Well, the truth is, knowing the specifics of air flows in a particular place, you can easily predict the direction of flight of snow. Knowing the specifics of the “relief” of a particular place, one can predict the direction of the air flow. Knowing the specifics of the terrain, one can predict the specifics of the terrain. And so on and so forth. At the same time, we again began to zoom in, but now for a specific event. Separating it from the “general” definition of the behavior of this event. Someone, stop Bayes, he has an attack!

So what do we get? Any system is simultaneously predictable and unpredictable to one degree or another, the difference is only in the scale of observations and the completeness of the initial data describing it.

What does the weather forecast and exchange trading have to do with it?


As we found out earlier, the line between a predictable and unpredictable system is extremely thin. But strong enough to draw a line dividing the weather forecast and trade.

As we already know, even the most unpredictable system in fact consists of completely predictable fragments. To model it, it is enough to go down to the scale of these fragments, expand the scope of observation, understand the patterns and approximate them, for example, using a neural network. Or derive quite a specific formula that allows you to calculate the desired parameters.

And here lies the main difference between the weather forecast and the price forecast - the scale of the largest predictable simulated component. For weather forecasting, the scale of these components is such that they are well ... they can be seen from the orbit of the Earth with the naked eye. And what is not visible, for example, temperature and humidity, can be, thanks to weather stations, measured in real time also throughout the planet. For trade, this scale ... more on that later.

The cyclone will not say "I'm tired, I'm leaving," disappearing out of the blue at an unpredictable point in time. The amount of heat received from the Sun by a particular hemisphere of the planet varies with the same pattern. The movement of air masses on a planetary scale does not require atomic simulation and is quite simulated at the macro level. A system called "weather", which is a random event on the scale of a specific point on Earth, is quite predictable on more global scales. And still, the accuracy of these predictions leaves much to be desired at distances of more than a couple of days. The system, although predictable, is very complex so that it can be modeled with reasonable accuracy at any point in time.

And here we come to another important property of predictive models.

Self-sufficiency or autonomy of predictions


This property, in general, is quite simple - a self-sufficient forecasting system, or an ideal forecasting system , can do without external data, not counting the initial state.

She's perfectly accurate. To predict the properties of the system in state N, it is enough for her to obtain the calculated data in state N-1. And knowing the state of N, you can get N + 1, +2, + m.

Such systems include, for example, any mathematical progression. Knowing the state at the reference point and the number of this point in a series of events, one can easily calculate the state at any other point. Cool!


And this is also the answer to the question why the accuracy of the weather forecast dramatically falls over a long distance in time. Looking into the future, we build a forecast based not on the real state of the system, but on the predicted one. Moreover, not with 100% accuracy, unfortunately. As a result, we get the effect of accumulating forecast errors . But this is despite the fact that we know almost all the significant "variables" and the description of the system can be called almost "complete".

What about quotes?


And with quotes, things are much worse. The fact is that in weather forecasting, almost all of the received and predicted data is both the cause and effect of events. The consequence of the events of the previous step, the cause of the events of the next step. Moreover, those significant data and events that are not both cause and effect are most likely simply cause and carry a powerful payload. For example, the amount of heat received from the sun at a point in time. And it is invariable. It is this that increases the self-sufficiency indicator of such forecasts. The consequence flows into the reason for the events in the next step. This is a completely non-Markov process that can be described by differential equations.

While the statistics of quotes is mainly either only consequences, or 50 \ 50 . The growth of quotes can trigger a further increase in quotes and become a cause. And it may not provoke and cause. And it can provoke profit taking and, as a result, a fall in prices. Historical data on exchanges look solid. Volumes, prices, "glasses", so many numbers! The vast majority of which are good for nothing, being only the result, echo of events and causes, lying far beyond the plane of these statistics . On a completely different scale. In a completely different scope.

When modeling future quotes, we rely only on the consequences of events that are much more complex than just the percentage deviation of the purchase volume.Price does not shape itself . It cannot be differentiated by itself. If the market is expressed as a metaphorical lake, the stock chart is just ripples on the water. Maybe this wind blew, maybe they threw a stone into the water, maybe the fish splashed, maybe Godzilla jumps 200 kilometers on a trampoline. We see only ripples. But in this ripple we are trying to predict the strength of the wind in 4 days, the number of stones that will be thrown into the water in a month, the mood of the fish the day after tomorrow, or perhaps the direction Godzilla will go when he gets tired of jumping. He will come closer and deploy the trampoline again - the ripples will become stronger! Catch the trend, hop hop hop!



This is a very important point:
, , , .
In other words, you cannot model the system well enough without having a sufficiently complete description of it.

Unfortunately, the scale of the maximum possible simulated component of the system, in the case of the market, comes down to man. Not even to a person, but to his psychophysical state, on which the reaction to market behavior depends, and which, by this very reaction, will itself influence the market. The very reason flowing into the consequence! Only thousands, if not millions of unique, individual people, will have to be modeled. With personal problems, feelings, hormonal background, interactions, everyday activity.

And it's not just about traders in the market on a global scale. It is also about the people behind specific projects. It is about the problems and successes of projects in the future. It is about important events in the same future. Events, sometimes extremely unpredictable. It turns out that in order to predict the future, we need to know the future.


In total, we need a sphere of observed conditions, which is completely inaccessible to us. The scale of the simulation, which for us is completely unattainable.

Well, that is, in theory, of course, achievable. Brownian movement, in theory, is also a very simulated and predictable system, remember? Then remember the price of the practical implementation of such a simulation. This price is prohibitively higher than the process of feeding a neural network with exchange candles. At least at the time of this writing.

But what about the graphics?


Really. At the very beginning of this article, we presented charts with extremely high forecasting accuracy, bordering on places with 100%.

Let's look at them again:








What do you see? Take a closer look. Great coincidence, isn't it? Perfect, just perfect. And on the first and second graphs, the neural network naturally ahead of quotes one step ahead!

Remember, I mentioned high-level libraries for working with neural networks, and then this did not get any development in the text of the article? Now get it. Widespread availability of anything, certainly reduces the bar for training the average user. The same thing happens with neural networks. "Kaggle kernels" is a record. Any non-narrow-section section is simply buried in tons of solutions, the authors of which, in the vast majority, have no idea what they are doing at all. And from below, each decision is supported by pillars of laudatory comments from people who understand the issue even less. “Great job, what you need!”, “I’ve been looking for a kernel suitable for my tasks for so long, here it is! And how to use it? ” etc.

To find among this something really interesting and beautiful is very, very difficult.
<frantic snobbery>
As a result, we have such a phenomenon as people who easily operate with a rather complicated mathematical apparatus, but are not able to read graphs .
</ rabid snobbery>

After all, time on the X scale moves to the right, and a prediction, ideally, should be obtained before the event.

Simply hyperparameters are not twisted yet


We are all happy when our neural network shows signs of convergence. But there are nuances. In programming as such, there is a rule "started does not mean earned." When we are just starting to learn programming, we are immensely pleased with the fact that the compiler / interpreter was able to understand what we slipped into it and did not throw us errors. At this level of formation, we believe that errors in a program are only syntactic.

In the design of neural networks, everything is the same. Only instead of compilation is convergence. It worked out - it doesn’t mean learning exactly what we need. So what did it learn?

An inexperienced researcher, looking at such beautiful graphics, is likely to come out. But more or less experienced, alert, because there are not so many options:

  1. The network is explicitly retrained (in the sense of “redundant” rather than “reuse”)
  2. The network exploits a flaw in teaching methods
  3. The network has approximated the Exchange Grail and is able to predict the state of the market at any moment in time, “spreading out” an endless chart from just one candle

What do you think, which option is closer to reality? Unfortunately, not the third . Yes, the network really learned. She really amazingly accurately gives results, but why?

Although artificial neural networks are not an “electronic model of the human brain,” they still exhibit some properties of the “mind”. Basically, this is “laziness” and “trick”. And at the same time. And these are not the consequences of the emerging in a couple of hundreds of "neurons" of self-awareness. These are the consequences of the fact that the term “optimization” is actually hidden behind the populist term “education”.

A neural network is not a student who is studying, trying to understand what we are explaining to him, at least at the time of this writing. A neural network is a set of weights whose values ​​must be adjusted or optimized in such a way as to minimize the error of the result of the neural network relative to the reference result.

We give the neural network a task, and then we ask it to "pass the exam." According to the results of the “exam”, we decide how successful it is, rightly believing that in the process of preparing for the “exam”, our network has acquired sufficient knowledge, skills and experience.

See the catch? No? Yes, here it is, on the surface! While your goal is to teach your network useful skills, in your opinion, its goal is to pass the exam .


At any cost. By any means. Perhaps, nevertheless, with some students she has more in common than was stated in the two paragraphs earlier ...

So how do you pass the notorious exam?

Memorize


The first option on the list of possible reasons for such incredible accuracy. Almost any novice researcher of artificial neural networks certainly knows that the more neurons in it, the better. And, even better, when there are many many layers in it.

But it does not take into account the fact that the number of neurons and layers increases not only the network potential in the field of "abstract thinking", but also the amount of its memory. This is especially true for recurrent networks, because their memory capacity is truly monstrous.

As a result, during the optimization process, it turns out that the most optimalthe option of passing the exam is ... regular cramming or "overfitting", "overfitting". The network will simply learn all the “correct answers” ​​by heart. Absolutely not understanding the principles by which they are formed. As a result, when testing the network on a data sample that it had never seen before, the network begins to carry nonsense.

For this reason, for training deep / wide networks you need much more data, you need regularization, you need control over the minimum error threshold, which should be small, but not too much. And, even better, find the right balance between the size of the network and the quality of the solution.

Good. We will consider. We will throw out the extra layers. The architecture is simplified. We’ll implement all sorts of different tricks. Will it work now? Is not a fact. After all, number two on the list of easy exam options:

Outsmart the teacher


Since the classification of a neural network does not get for the process, but for the result, the process by which it achieves this result may differ slightly from what the developer intended. This is one of the most vile moments of working with these beautiful animals - when the network has learned, but not that.

When you see graphs with course predictions that perfectly repeat the real course, think about what you taught the neural network? Super accurate to predict prices? Or maybe just repeat them like a parrot?

Be sure that the network, which has almost 100% accuracy on the training set and the same on the test set, simply repeats everything that it sees. Networks in which the prediction graph is shifted one step in time to the right (examples of graphs 1 and 2) simply repeat the price value from the previous step, which is passed to them in a new one. The graphs, of course, look very encouraging, and almost perfectly match, but they have no predictive power. You are able to announce yesterday's price today yourself, for this you do not need to study at Hogwarts or polish Palantir, right?

But this is if you give them the values ​​from the previous step, comparing with the value of the current step. Sometimes people just give a value from the current step, comparing it with the next step. In this case, we get beautiful graphs that match the original ones almost perfectly (examples of graphs 3 and 4).

Sometimes you can see graphs that do not match perfectly, softer, as if smoothed, interpolated . This is usually a clear sign of a recursive network that is trying to link a new result with a previous one (graph example 3).

All these results have only one thing in common - the neural network has learned to pass the 5-plus exam. But she did not learn to solve the task assigned to her in the way that was required and does not bear any practical benefit for the researcher. Just like a student, but with a cheat sheet, right?

Why does the network repeat the previous values, and not try to generate new ones? Yes, simply during the training, she comes to the reasonable conclusion that, usually, the closest point to the next point on the graph is the previous one. Yes, the magnitude of the error in this case floats, but it, on a large sample, is stably smaller than the magnitude of the error when trying to predict the next state in a quasi-random process .

Neural networks can perfectly generalize. A generalization of this kind is an excellent solution to the problem.

Alas, no matter how you twist hyperparameters, the future will not open for her. The chart will not move one step back in time. Yes, the Grail is so close, but so far.


Stop. No no no. But what about algorithmic trading? She exists!


Exactly. Of course it exists. But the key point is that this algorithmic trading is not the same as algorithmic divination . Algorithmic trading is based on the fact that the trading system analyzes the market at the current time, making the decision to open and close a transaction based on a large number of objective parameters and indirect signs.

Yes, this, technically, is also an attempt to predict the behavior of the market, but, unlike the predictions for days and months in advance, the trading system tries to work at the most permissible small time intervals.

Remember the weather forecast? Remember that its accuracy drops dramatically over long distances? It works both ways. The shorter the distance, the higher the accuracy. You, looking out the window, even without being a meteorologist, can predict what air temperature will be in a second, right?

But how does it work? Is this not contrary to all that has been said? But what about ripples on the water, how about the lack of data? What about Godzilla, after all !?

But no, there are no contradictions. As long as the trading bot works at very small intervals, really small, from a minute to fractions of a second, depending on the type, it does not need to know the future and does not need to have a complete picture of the market. It is enough for him to understand how the system around him works. In what circumstances it is better to open a deal in it, in which to close. A trading bot operates on such a small scale that its field of view is capable of covering enough factors to make a successful decision over an acceptable short distance. And to do this, he absolutely does not need to know the global state of the system.

Conclusion


The article turned out to be big. More than I expected. I hope it will be useful to someone and will help save time for someone who decided to try their luck in search of the Holy Grail of Commerce.

Let's highlight the main points:

  1. Godzilla has a trampoline
  2. You need to understand how the tools with which you solve the problem
  3. It is necessary to understand the limits of applicability and adequately assess the solvability of the problem as such
  4. It is important to be able to correctly interpret the results of the toolkit
  5. Neural networks are function approximators, not predictors of the future
  6. f(x)=xandf(xn)=xn-1 - these are also functions
  7. To simulate the state of a system, you need to have a complete or close to that description of this system
  8. Statistics - only a partial, selective description of the consequences of the system
  9. A good forecasting system should be moderately self-sufficient
  10. Neural networks cannot be taken for a word, they are insidious, cunning and lazy
  11. Want AI to help you trade? Teach him to trade

Thanks to everyone who read to the end!

PS No, this is not an article about the "fundamental analysis of VS technical." This article is about "there are no miracles."

All Articles