Page 53

@1. Introduction

Forecasting the behaviour of economic systems, such as national economies and stock or trade markets, represented by indexes, differs from predicting the behaviour of technical systems in a way that the underlying laws are usually not well understood. This holds particularly true for short term predictions where fundamental relations gain little influence. If the underlying laws are unknown, one can try to empirically find regularities of the system by observing its behaviour in the past.

Most classical methods of forecasting, e.g. linear regression or autoasso-dative integrated moving average (ARIMA), use linear models. But economic relations are usually highly nonlinear and recurrent. Therefore, forecast results of economic indexes are usually biased, less accurate, and afflicted with a greater variance than those of technical systems.

On the other hand, even rather imprecise forecasts of economic data may be valuable. Predicting the trend (up/down) of the exchange rate between the $US and the DM for the next day with a correctness of 61% [Zim 91] may enable banks to achieve high profits. Moreover, long term predictions can serve as a valuable basis for decisions concerning investments or even fiscal policy.

Neural networks can overcome or at least alleviate some of the problems inherent in economic predictions since they can build nonlinear models from observed behaviour of systems.

In paragraph 2 the properties of neural networks are described and linked to the task of forecasting economic time-series in paragraph 3, Paragraph 4 is devoted to backpropagation networks. In paragraph 5 forecasting problems, specific to, economic questions, are discussed. Solutions to these problems are proposed in the main paragraph 6, where several methodsPage 54 of tackling the problem of generalization are discussed in detail. These methods are applied to the task of predicting the DAX one week in advance and the results are presented in paragraph 7 and compared with results obtained from linear regression. The concluding paragraph 8 summarizes the advantages of and problems with using backpropagation networks for economic time-series prediction.

@2. General Properties of Neural Networks

Most statistical methods of time-series forecasting depend on assumptions about the underlying laws of a given phenomenon. Parametric models allow to adapt only a few parameters. Often these models use linear relations between the observable (or transformed) input data and the output data, e.g. the simplest autoregressive model

[ IT DOES NOT INCLUDE FORMULATES ]

with real parameters B and a1.

Neural networks have a great number of adaptable parameters, called weights, and can be considered as nonparametric models. They provide a higher degree of freedom and some types can approximate any continuous function describing the input/output behaviour of a system. In particular, they use nonlinearities to model the input/output relations. The huge number of parameters are adjusted through a training process.

The backpropagation network described in the next paragraph uses supervised learning to adapt its weights. This means that past input/output data pairs are presented to the network which thereupon adapts its weights until its output calculated from the presented input is almost identical to the target output. Thus the model selection is mainly done by the network itself and the user need not know the laws which govern the system or the type of function describing its input/output behaviour^{2}.

@3. Pattern Association in Economic Time-Series Forecasting

Predicting the future on the basis of observed regularities means to find patterns in the time-series. We can distinguish time patterns from spacePage 55 patterns. Time patterns are regularities in a single time-series, e.g. V-, W-, M,- and headshoulder-formations in the technical analysis (chart analysis) of stock prices. Time patterns of a time-series y can be modelled by a function

[ IT DOES NOT INCLUDE FORMULATES ]

Space patterns relate different time-series. They can be modelled by a function

[ IT DOES NOT INCLUDE FORMULATES ]

e.g.

DAX = f(capacity utilization, outstanding orders, interest rate, put-call-volume, ...).

Of course, the predicted value may also be a vector, e.g,

y = (DAX(t + l) DAX(t + 2), up/down(DAX(t)),...)

In this case an input pattern has to be associated with an output pattern.

There is an ever lasting dispute between fundamentalists who believe that most economic time-series can only be effectively forecast on the basis of other time series representing fundamental data, and the more technically inclined chart analysts who assume that all external factors are reflected automatically in the time series and therefore only look for time patterns. In the following we shall use both space and time patterns, e.g.

DAX((t +1) =f(DAX(t - 4)...,DAX(t)FAZ-Fr¸hindikator(t)^{3}, interest rate(t - 20),...) window of length 5 lag 20

The neural net can decide itself what data to use. This will give us some hints concerning the importance of the data for the respective prediction.

@4. Using Backpropagation Networks for Pattern Association

The most versatile artificial neural network, backpropagation, which was the basis of our research on time-series forecast, will be presented in a rather informal way to establish a common language and common notations.

Page 56

@@4.1. Pattern Association with Feedforward Networks

Artificial neural networks (ANN) are designed to mimic natural neural nets. Both consist of a great number of simple neurons which can work parallel. Each neuron Is connected with many other neurons, A neuron has an activity level which determines Its output. The output signal Is communicated to other neurons via synapses which can amplify or diminish it. This is simulated In ANNs by multiplying the output signal with the weight of the connection. All Incoming weighted signals are added and the sum Is compared with an individual threshold of the neuron^{4}. The difference, which is called net input, is passed through an activation function or transfer function to produce the activation level of the neuron.

Usually a sigmoidal squashing function, e.g. the hyperbolic tangent, Is used for activation. It can be Interpreted as a differentiable approximation of the sign function.

[ GRAPHICS ARE NOT INCLUDE ]

Neurons with common properties are clustered. In feedforward networks the clusters which are called layers build a hierarchy. The neurons within a layer are not connected with each other but with some or all neurons of adjacent layers. An input layer serves as buffer for an Input pattern. Its activation function Is the Identity. Activity patterns are propagated synchronously from the Input layer(s) via the hidden layer(s) to the output layer(s).

An Input pattern, e.g. a window (y(t - l),..., y(t)) of a time series Is mapped onto an output pattern, e.g. a vector o = (o1,o2) that Is to predict the target window (y(t + l), y(t + 2)) of the time series.

In principle, a feedforward net with a single hidden layer of sufficientPage 57 size and sigmoidal squashing function can approximate any continuous mapping of input patterns to output patterns. But in practice it is hard to find appropriate weights to realize this mapping. Additional hidden layers can sometimes simplify the task of weights adaptation.

@@4.2. Learning in Backpropagation Networks

The weights of a feedforward net have to be adjusted so that the desired mapping

[ IT DOES NOT INCLUDE FORMULATES ]

is approximated by the network input/output function

[ IT DOES NOT INCLUDE FORMULATES ]

i.e. the global error

[ IT DOES NOT INCLUDE FORMULATES ]

has to be minimized, Here

[ IT DOES NOT INCLUDE FORMULATES ]

where j runs through all output neurons, is the error of pattern p and

[ IT DOES NOT INCLUDE FORMULATES ]

is the error of pattern p at the output neuron j.

[ GRAPHICS ARE NOT INCLUDED ]

For feedforward networks without hidden layers weight adaptation is a simple task. The weights are initialized with small random values. Then thePage 58 pairs (xp, tp) of input and target patterns are presented to the-network one after the other. The output value of the net is calculated and compared with the target value. Then the weights between input and output layer are adapted according to the rule

[ IT DOES NOT INCLUDE FORMULATES ]

The weight change [ IT DOES NOT INCLUDE FORMULATES ] is proportional to the error [ IT DOES NOT INCLUDE FORMULATES ] and to the contribution op1 of the input neuron i to the error. The learning rate n is a small positive real number: 0

It can easily be shown that the delta rule

[ IT DOES NOT INCLUDE FORMULATES ]

describes a steepest descent to the nearest ´valleyª of the global error function. For feedforward networks with hidden layers the weights to output neurons can be adapted in the same way. The delta rule can also be used to adapt the weights to the (next) hidden layer but it is not obvious how the error [ IT DOES NOT INCLUDE FORMULATES ] at the hidden neuron k can be determined.

[ GRAPHICS ARE NOT INCLUDED ]

A simple mathematical analysis [Rum86] yields

[ IT DOES NOT INCLUDE FORMULATES ]

Page 59

The errors [ IT DOES NOT INCLUDE FORMULATES ] at the output neurons j are backpropagated to the hidden layer. A hidden neuron k with a large weight wjk contributes to the error [ IT DOES NOT INCLUDE FORMULATES ] to a higher degree than a hidden neuron with smaller weight. Therefore, it is assigned a greater credit [ IT DOES NOT INCLUDE FORMULATES ]. The contributions [ IT DOES NOT INCLUDE FORMULATES ] of neuron k to the errors [ IT DOES NOT INCLUDE FORMULATES ] are added giving the error [ IT DOES NOT INCLUDE FORMULATES ].

If further hidden layers exist, their errors are calculated successively in the same way according to the errors of the layers above. The patterns, are repeatedly presented to the network and the weights are adapted until the global error has become...