7.1 Forecasting Using Predictive Inference

In this Section, we are concerned with predictive inference using observed data to predict future data that is not known yet but that is important to forecast with high confidence and low uncertainty. In other words, it is assumed that we can encapsulate historic patterns in a model a learn about the future with such model.

In hydrology, we are dealing with time series, i.e. ordered observations in time. A generic model structure thus can be specified in the following way

\[ y(t+\Delta t) = f(y(t),x(t)) + \epsilon(t) \]

where \(y(t+\Delta t)\) is called the forecast target (discharge at a particular gauge in our case) and is the variable that we want to forecast in the future, i.e. \(\Delta t\) time away from now. \(y(t)\) denotes past known observations of discharge up and including time \(t\). Similarly, \(x(t)\) denotes other variables of interest, called external regressors, that might be relevant to obtain good quality forecasts such as meteorological data from local stations, including precipitation and temperature. Finally, \(f()\) denotes the type of model that is being used for forecasting and \(\epsilon\) are the time-dependent error terms. If, for example, one would use a linear modeling approach without external regressors, such type of model could simply be written as

\[ y(t+\Delta t) = \beta_{0} + \beta_{1} \cdot y(t) + \epsilon(t) \]

In the model specification above, the aim is to predict into the future with a lead time of \(\Delta t\), i.e. for example one month ahead. The lead-time model can be written in equivalent form using lags in the following way

\[ y(t) = f(y(t-lag),x(t-lag)) + \epsilon(t) \]

where \(lag = \Delta t\). This just means that we use all the available observations until and including \(t-lag\) for predicting the target at time \(t\). We will use this specification throughout the Chapter when working with and developing new forecasting models

  • feature engineering

  • experimentation and ensembling

  • Knowledge of key events, i.e. date shifts holidays

  • Deep learning (data permitting)

  • Boosting errors