The Model Foundry

The problem of predicting points in a time series will be used to introduce the concept of a Model Foundry, where the user identifies the likely characteristics of the process that lies behind the time series, assembles a model of appropriate complexity, and the system then estimates the parameters of the composite model.

A whole industry exists for predicting future values from a time series. One branch of the industry uses some form of Box Jenkins methodology, where AutoRegression (AR) or Moving Average (MA) or some combination (ARMA, ARIMA) is sought in the time series - parameters for one of these models are estimated, and these parameters are used for prediction. The technique requires considerable skill to apply (most of the skill lies in making sure it is not applied in inappropriate cases), and is frequently abused by using it on nonlinear processes.

The other branch attempts to train a neural network to produce the time series - much art is used in incorporating in the neural network the properties that may exist in the time series, such as memory of the last value (or the last but one, or...) to modify the next. The resultant model lacks identifiability - that is, there is no partial correspondence, no interior point in the model that is an analogue of an interior point in the process. The result is hard to explain or extend (or to defend the whole concept when the output is wrong).

Some possible characteristics of the process producing the time series that may need to be modelled using components available in the Model Foundry:

Capacity Limitation
Few processes are unlimited in capacity. If a value is increasing or decreasing, it is likely to eventually reach a limit, where it saturates. It may saturate by switching from linearly increasing to no change, or more likely, it will slowly reduce the rate of increase until no further increase occurs - it saturates. A further possibility is like filling a jug, where steady filling changes when the neck is reached, to rapid filling, then to overflow or alarm. A time series from one part of a process may look as though it has no limits, while another part of the process has wild swings as it maintains the illusion.
Energy Limitation
Most processes are energy limited. A vehicle cannot change its speed instantly (unless it runs into a brick wall, which markets as well as cars can do), a swimming pool can't be filled instantly out of a tap (but it can probably be emptied faster than it fills). Sometimes the nature of the limitation switches, like the switch from laminar to turbulent flow, or a change from capital funding limits to personnel availability limits.
Energy limiting is a good way of detecting errors in the time series values. Rapid changes in value that would be impossible based on energy limits in the process must be measurement errors. Energy limits will usually be subject to change as other characteristics alter.
Supply and Demand
Many systems have two components which work together in reasonable synchronisation. The supply of a good is driven by its demand, and vice versa. If supply falls behind, the price goes up, which stimulates new sources of supply. If supply exceeds demand, the price falls, which may stimulate demand, or make some sources of supply uneconomic.  A typical market may have many supply and demand systems, all integrated and working together - transport, energy, food, etc. Sometimes the approximate synchronisation breaks down, leading to (predictable) wild swings.
Growth and Decay
Systems grow and decay over time. The market for button up boots, once thriving, has now decayed to nothing. Capacity limitations change as a market grows. The average rate of growth may remain constant, but the limitations change as well. Measuring a system on one dimension may conceal the fact that it is growing in three dimensions, and is limited by total volume, rather than any one dimension. Measurement and prediction, when used for control, may lead to a distortion of the process creating the time series.

The benefit of the Model Foundry approach is that a reasonable composite model can be quickly constructed from components, and its parameters estimated. The model can have connections that allow the characteristics of the process to vary with time - the capacity constraints to vary with growth, or the energy levels to tune themselves to what is being found.

The model can be changing its parameters as new values for the time series arrive, rather than using fixed parameters found in a one time identification of the series so far.

The knowledge network that provides the substructure for the model allows the model to be extensible - you can tack new bits on as they become necessary - and to be adjusted in a non-parametric way - you can make new connections in the existing model that change its behaviour, perhaps radically.

So what happens when the user has no idea of the underlying process? An assumption can be made, by allowing the properties of the time series to index into a catalogue of models, and the system then puts out of range the parameters it can't estimate. If the time series then hits a limit, the limit range can be adjusted to suit.


See Model Building