Neural Net: walking on the edge of Chaos
written by Sergey & Tatiana Tarassov
My first close acquaintance with Neural Network (NN) technology (due to the necessity of writing a code) has happened in 1997. Actually, I was very impressed by this technology. My university mates anticipated my unavoidable romance with this math theory and gave me this advice: "Remember that the Neural Network can resolve only those tasks that may be resolved by a human brain. Try to solve a problem manually. If you can do this, you may apply the NN. If you cannot, the NN will not help you". They knew what they were talking about: their own romance with NN has begun 10 years earlier. I repeat these words to myself very often. This technology comes too close to some human illusions.
Books and magazines on this issue were published in the middle of 90-es gave the reader an impression of some anticipation and agitation. It seemed that something big and very useful has entered our lives, so we have no need to worry, all our problems will be solved by these Neural Nets. But - the question still persisted: is it really possible to get something truly new under the Moon playing with the same well-known four elementary arithmetic operations?
Sometimes people come to me and say something like this: "Look, I have found some exciting method to forecast, it works though needs a lot of calculations. If only the NN could do that...". Usually, I ask in my turn: "If you would have a thousand people under your command who help you with the calculations and they do it for free, will you really make money by this method? Do you see the whole process of making forecast, from the beginning to the end? Do you have a proven record that this method really works (even if it does it is very slow)?". Very rarely do I get some reasonable answer to this question. It might be so because many people are intrigued by just the word "artificial intelligence". They dream that it is possible in our complicated life to put the responsibility for the most righteous decision on the shoulders of some super smart guy whose name is "Artificial Intelligence". The main question is: who should make this righteous decision - YOU or the guy named "Artificial Intelligence"? If it is you - is that your Cadillac over there? I'll help you to make one more. If it is the "Artificial Intelligence" guy - sorry, I can not help you.
I agree that NN technology allows to resolve the tasks that were unsolvable 50 years ago. But, as any non-linear system, Neural Net might become the Pandora's Box. And only human mind can make a difference between the forecasting system and Pandora's box. So, the following is an attempt to share some thoughts on common illusions regarding to the Neural Nets.
This is the most common mistake for people who use Neural Networks for forecasting. The motivation for this mistake is pretty obvious. There are lots of factors that might move the stock market or might stop its movement. The very first impulse is to use all these factors for forecasting and let the NN decide which factors are really important and which are not. In my life, I have never met a person who had avoided this temptation while working with NN (I am not an exception as well). The result is the effect of over-training. It means that the NN gives very good results within the training interval and nothing good on the testing interval. To make it more clear, I would use this analogy: Suppose that you have met an extremely well educated guy who spent all 40 years of his life at different universities studying philosophy. He spent almost all his time in auditoria and in the libraries and never went outside. Therefore, he never saw and never experienced the real life. One day it turns out that he has to go to a real world. And it is quite a shock: so many people around (and not all of them are professors or students), a lot of noises, cars honking, plus president elections in a week or two... Our poor guy tries to apply all his knowledge to understand what is going on, but this doesn't do much help. The reason is that his enormous knowledge provides him a lot of different explanations and proposes too many possible solutions when he needs just one solution and only one explanation applicable to today's situation. He will get this single explanation, but in a few days, when the situation to be solved becomes his past. Thus, his real experience and his knowledge does not match. He can think about eons easily while the life demands his attention to small everyday tasks. How can we help him? Only letting him get more experience of the real life.
Let us look at this in regards to the Neural Net technology.
As an example, I downloaded the crude oil prices from 2000 to 2004. So, there are 1000 price points in total. I applied the big astronomical model that uses 2000 different astronomical phenomena. For each price point, we have 2 astronomical phenomena. Now, let train this model. We train it on the blue (training) interval and test the model's performance on the red (testing) interval. This is the result of NN training:
The black line is the normalized price; the blue line represents the projection curve produced by this NN. This model gives extremely good coincidence inside the training interval (the black and blue curves are very close), while this picture changes drastically on the testing (red) interval. Thus, this NN explains very well the past - because it uses so many events (as explanations) and not so many price points (as experiences of a real life). So, it is not surprising that in this situation NN can explain everything, but in the past.. In one book I have found some exciting definition of this phenomenon; the over-training effect is called there as "the result of being raised by Grandma" - the blue part of the picture represents the child's life under Grandma's supervision while the red one stands for the life with parents...
How can we avoid this situation? The advice is very simply: for NN inputs, use only those events that you are sure of having the influence on the market, and do not use all available events. One of the non-relevant though generally important factors that you might use to create the forecast can make your forecast worse. How could you know what factors are important? Usually, the use of correlation or statistical analysis helps. But you should always remember that these methods reveal linear relations between phenomena, while I personally believe that most of the real life processes that we analyze are non linear. Timing Solution provides different methods that allow to identify playing factors.
In the Timing Solution, we use a specialized NN that reduces the over-training effect.
The ability to deal with non-linearity is the most impressive feature of Neural Net technology. It allows NN to solve the tasks that were very hard to solve before. Let me try to explain the difference between linear and non-linear approach with some poetic analogy. As Martin Heidegerr wrote in one of his books: "The language is the house of the being" (forgive me this poor translation as originally it has been written in German, I myself have read it in Russian, and now I am trying to express it in English). Let's think about it. The language consists of the words. But, in comparison to regular things, the words have a totally different nature. The sentence that is formed by these words is not just a sum of these words. Its content depends strongly on the relations between the words. Thus, we can say that the rules of forming this sentence, its Grammar, are not context free. It is the same as we cannot consider a human being consisting of 70% of water, 5% of something else, etc. Or we cannot say that Brandeburg Concerto N1 by Johann Sebastian Bach is only 35% of F notes, 15% of A notes, and 20% of C notes... So, we can conclude that the information expressed by words in a sentence is not just a sum of these words. Is it just a poetic analogy? - Not really. The most appealing feature of Neural Net technology is that it gives one of the possible ways of revealing this information, and it was the reason why I was so impressed by NN technology in 1997.
As an example of revealing the information, I can show you this. There is the oldest trade technique, Japanese Candle sticks, that has been used from 1700's to predict rice prices. Each price bar is qualified in a special way, like: White Candlestick, Shaven Bottom, Hanging Man, Hammer. These bars can form different kinds of patterns: "Morning Star", "Evening Star", etc. We can try to apply some kind of statistical or correlation analysis, but I guarantee that the forecasting model based on linear math only will not be efficient. Any linear math can see only simple connections, it can see how "sounds" each word separately (in our example, the price bars in the Japanese Candle stick formalism), though it cannot specify how these words sound together, in ensemble. The math modeling for the price for crude oil from 1983-2004 shows that the forecasting model based on this technique gives not so bad results (for 7-10 trades ahead), while the same model based on a linear model does not give any result at all. Practically, it means that the influence of any price bar strongly depends on other bars around. Neural Network reveals this fact. This is the link: http://www.timingsolution.com/TS/Basic/illustration_2.htm
This is good news about NN. The bad news is that, as any non-linear system, sometimes it can become totally unpredictable. Usually, it happens when we do not care about inputs for the NN. The approach "I will use everything I know and let NN make a decision what is important and what is not" makes this system totally unpredictable. If it is the case, the NN has too many associations to think of and cannot decide what associations are just accidental and what facts are really working.
If we do not provide the NN with the guidelines, there will be no difference what to use for making a forecast - the guts of the ram or birds' flight as in ancient pagan world, or a crystal ball, or the modern computer with a sophisticated program, - in all these cases, the guts, the birds, the ball and the computer are just tools to ignite our intuition. Only a human mind can make decision what factors important and what are not. Math can help, help a lot, but cannot do more than the human mind already can and it cannot make the final decision.
We can dress the Truth in the clothes of modern sophisticated mathematics, but the difference between True and False is set only by Human Mind.
Timing Solution Technology: Information against Digits
There is some discrepancy between the problem that a mathematician usually resolves and the task what a client wants to be solved by the mathematician. The client is seeking the practical solution of some practical problem, and sometimes this problem simply has no single solution. This discrepancy is very well described in one old story: Once, some famous mathematician decided to show that mathematics is very useful in common life. He advertised his lecture regarding to mathematical solutions for cutting and sewing. It has happened that many people came to the lecture. Most of them were tailors. But most of them left after the opening statement which was: "Suppose, that your client is a ball..." - because not many tailors could suppose this and not all their clients fitted to this definition...
In my personal opinion, if some practical task has to be solved, - be ready that the very big probability exists that this task will not be the same as the problems the mathematicians deal with in the universities. It is very true in regards to the stock market modeling. Here is a list of usual problems:
not enough data for statistical analysis;
the problem has many solutions;
nobody knows which one is a right one.
The only thing is clear - the problem must be solved in any case. So, in some sense, dealing with real, practical tasks, you should lose your mathematical virginity.
When I begun to write NN that is capable to make a forecast, I faced some terrible for any mathematician facts:
a) We are trying to predict the stock market though the scientists already expressed their opinion on this issue proposing the Random Walk Theory. It contains no more useful information than in the sentence like this: "everybody will die some day";
b) The user has a tendency to put into the Neural Net all available - valuable and not valuable - information. Thus, over-training is unavoidable.
So, to deal with this situation, I have decided to create a kind of an universal technology that will grow together with our knowledge regarding to the explored subject. I understand that it will not be a final answer to all questions. I hope it will give us a process to extend our knowledge and an ability to apply this knowledge to our profit. So, here is what I have done:
1) Created the special standard to describe any process that reveals itself in time. It is not just some set of digits, this standard has a description of main features of the process: how it interacts with world, what methods of optimization are preferable. Here the method of Fuzzy Logics is applied. This technology is open and allows to add new events when our knowledge grows.
2) Developed the special kind of NN, named Object Oriented Neural Net that deals with those objects. We cannot work with digits only, now we want to work with the information. This approach allows to use the available information in its maximum (not just as many facts as it is possible, but the sense, the meaning, the relations between the parts of the information unit). Of course, this technology has to be developed more.
3) Provided the preprocessing procedure inside the Neural net that diminishes the over-training effect significantly. We simply define the initial NN topology (i.e., amount of hidden neurons).
One example of usage new technology
As an example we will show how Object Oriented Neural Net performs auto regression analysis; this approach improves the results of regular auto regression dramatically.
The regular approach is based on the linear auto regression model. Under this approach, we can state something like this:
(Price changes today)= A1 x (Price changes yesterday) + A2 x (Price changes two days ago) + A3 x (Price changes three days ago), etc.
As it was mentioned by Lou Mendelsohn (see http://www.profittaker.com/market_analysis.asp), this method was quite popular in the 1980s. Now it gave a way to other methods. Why has it happened?
The disadvantage of this approach is the linearity. This comparison may illustrate this idea. All market movements can be explained as the result of activity of two different groups of factors. We can compare them to two armies: the army of good guys (factors that cause upward price movements) and the army of bad guys (factors that cause downward price movements). Thus, from the point of view of the classical auto regression, we can say that the price today depends on the balance of good and bad guys at this moment. It is just simple accounting; what we need to do is just calculate the balance... But, in real life, this is not that simple. Straightforward accounting does not work here - otherwise 300 Spartan men would not have a chance to stand a minute against the hordes of Persian warriors. I hope that this analogy with the armies helps to understand why linear approach to auto regression has lost its popularity.
Is there any way to avoid such a simplification? sure there is: use Neural Net and Fuzzy Logic together. Returning to our comparison, this suggestion allows us not to divide all participants on bad and good guys (on two polar groups only). We better identify several groups among them: guys who push the market up high, guys who move the market moderately or guys who move the market up just a bit. The same groups exist among downward moving guys. This is a procedure called fuzzification. It helps to identify the players. Then the Neural Network begin its job revealing non-obvious relationships between the players and the outcome (in other words, relations between the factors affecting the market and the price today). Now the fact that there are two armies (bad and good guys) is not so important any more. The most important thing now is the strategy used by each of the armies. Each warrior plays his own unique role in this strategy. This not just accounting, this is information.
As a practical example, see this illustration of a true fuzzification procedure provided by Timing Solution program:
This is the distribution for crude oil price true range (% Close-Open) in 1983-2004. According to this procedure, the price changes are divided on 6 groups (like 6 types of different guys):
0%-0.35% Low Up, 0.35%-1% Medium Up, more than 1% High Up. The similar groups for the downward movement. When the groups are defined, we can work with them.
But - this approach also has its own problem, and this problem is related to the fuzzification problem. How many grades we should use to divide our "guys" properly? Obviously, "two" is not enough. In the example above, we have 3 grades for 2 polar groups of factors, altogether it gives us 2*3=6 groups. Is it enough? The less detailed division we provide, the higher is the chance that some valuable information might be lost. Thus, the accuracy of this approach is restricted by the number of division criteria (and the type of membership function as well). For example, if some small level is the crucial one for the process (like 1.35% price change), it might be left unnoticed when we have just these 3 fuzzy grades as in our example (Low, Medium and High). From the other side, when we use a more detailed division, we can come across the over-training effect.
To avoid this problem, the Object Oriented Neural Net (OONN) has been developed. It allows to deal with each warrior of each army individually. The Object Oriented Neural Network runs a new kind of optimization (we call it sub-optimization procedure). Generally speaking, it analyzes each factor individually and finds its "hot" points. So, continuing our metaphor, we can compare Linear Regression to two armies (good and bad guys), while Fuzzy Neural Net can be described as "graded warriors + nonlinear strategy", and Object Oriented Neural Net means: individually armed warriors + nonlinear strategy.
The main idea of Object Oriented Neural Net is to run this sub-optimization procedure that works with each event individually. This is not an easy way (for example, it took me about two years to find the way to make this procedure stable), but this way we can deal with a real object, not its shadow.
To demonstrate this technology in action, let us produce the back testing procedure for the model that combines Auto regression and OONN (Object Oriented Neural Network). We have created a non-linear model to forecast future movements of Dow Jones Index (to be exact, the oscillator with the period=10 bars) taking into account the previous changes of Close, RSI, and proportions for price bar 100%x(High-Low)/Close, 100%x(Close-Open)/Close and 100%x(Open-Low)/Close (of course, all these data have been normalized before the usage). We used DJI from 1970 to 2004, trained the Neural Net on 50 intervals, and made a forecast to test the model (the tested intervals were independent from data on the learning intervals). This is the result of back testing:
It shows that 32 times the correlation between the price and the projection curve was positive, while 18 times only it was negative. These intervals are independent, so with probability 84% this is not an accidental fact. To the contrary, the simple linear regression does not give any results:
Here you can see the results of back testing procedure.
At the end, I would like to conclude this article by the same words as in its beginning: the NN can help but it cannot replace a human mind. The NN is fantastically effective in solving such problems as operating the spying pilot-free airplanes - because it is based on the knowledge of the real people who really could fly this airplane in a real life. The NN shows good results in using the Japanese Candlesticks because this method has been approved and worked for almost 300 years. But anything that needs the distinguishing and definition can not be replaced by any mechanical system. The persistence in our attempt to make the mechanical system think instead of a Human Mind might lead to the loss of the control over it, and the Nature might revenge us transforming the thing designed to help us into Pandora's Box.
The statement "Garbage-in, garbage-out" is well-known among the professionals in NN technology. They often cite the statement that Neural Net allows to reveal hidden patterns. However, the word "hidden" should not be used as a synonym to our ignorance or obscurity. The Neural Net really can find hidden patterns - only if we provide it with the adequate frame (language, context) to uncover these hidden relations. Why? Just because "The language is the house of the being".
Sergey & Tatiana Tarassov.
October 2004, Toronto