# Diebold-Mariano test¶

This module provides a function DM that implements the one-sided version of the Diebold-Mariano (DM) test in the context of electricity price forecasting.

Besides the DM test, the module also provides a function plot_multivariate_DM_test to plot the DM results when comparing multiple forecasts.

## DM test¶

The Diebold-Mariano (DM) test is probably the most commonly used tool to evaluate the significance of differences in forecasting accuracy. It is an asymptotic z-test of the hypothesis that the mean of the loss differential series:

$\begin{equation} \Delta^{\mathrm{A, B}}_{k} = L(\varepsilon^\mathrm{A}_{k}) - L(\varepsilon^\mathrm{B}_{k}) \end{equation}$

where $$\varepsilon^\mathrm{Z}_{k}=p_{k}-\hat{p}_{k}$$ is the prediction error of model Z for time step $$k$$ and $$L(\cdot)$$ is the loss function. For point forecasts, we usually take $$L(\varepsilon^\mathrm{Z}_{k})=|\varepsilon^\mathrm{Z}_{k}|^p$$ with $$p=1$$ or $$2$$, which corresponds to the absolute and squared losses.

This module implements the one-sided version of the DM test using the a function DM function. Given the forecast of a model A and the forecast of a model B, the test evaluates the null hypothesis $$H_0$$ of the mean of the loss differential of model A being lower or equal than that of model B. Hence, rejecting the null $$H_0$$ means that the forecasts of model B are significantly more accurate than those of model A.

The module provides the two standard versions of the test in electricity price forecasting: an univariate and a multivariate version. The univariate version of the test has the advantage of providing a deeper analysis as it indicates which forecast is significantly better for which hour of the days. The multivariate version grants a better representation of the results as it summarizes the comparison in a single p-value.

epftoolbox.evaluation.DM(p_real, p_pred_1, p_pred_2, norm=1, version='univariate')[source]

Function that performs the one-sided DM test in the contex of electricity price forecasting

The test compares whether there is a difference in predictive accuracy between two forecast p_pred_1 and p_pred_2. Particularly, the one-sided DM test evaluates the null hypothesis H0 of the forecasting errors of p_pred_2 being larger (worse) than the forecasting errors p_pred_1 vs the alternative hypothesis H1 of the errors of p_pred_2 being smaller (better). Hence, rejecting H0 means that the forecast p_pred_2 is significantly more accurate that forecast p_pred_1. (Note that this is an informal definition. For a formal one we refer to here)

Two versions of the test are possible:

1. A univariate version with as many independent tests performed as prices per day, i.e. 24 tests in most day-ahead electricity markets.

2. A multivariate with the test performed jointly for all hours using the multivariate loss differential series (see this article for details.

Parameters: p_real (numpy.ndarray) – Array of shape $$(n_\mathrm{days}, n_\mathrm{prices/day})$$ representing the real market prices p_pred_1 (TYPE) – Array of shape $$(n_\mathrm{days}, n_\mathrm{prices/day})$$ representing the first forecast p_pred_2 (TYPE) – Array of shape $$(n_\mathrm{days}, n_\mathrm{prices/day})$$ representing the second forecast norm (int, optional) – Norm used to compute the loss differential series. At the moment, this value must either be 1 (for the norm-1) or 2 (for the norm-2). version (str, optional) – Version of the test as defined in here. It can have two values: 'univariate or 'multivariate The p-value after performing the test. It is a float in the case of the multivariate test and a numpy array with a p-value per hour for the univariate test float, numpy.ndarray

Example

>>> from epftoolbox.evaluation import DM
>>> from epftoolbox.data import read_data
>>> import pandas as pd
>>>
>>> # Generating forecasts of multiple models
>>>
>>> # Download available forecast of the NP market available in the library repository
>>> # These forecasts accompany the original paper
>>> forecasts = pd.read_csv('https://raw.githubusercontent.com/jeslago/epftoolbox/master/' +
...                       'forecasts/Forecasts_NP_DNN_LEAR_ensembles.csv', index_col=0)
>>>
>>> # Deleting the real price field as it the actual real price and not a forecast
>>> del forecasts['Real price']
>>>
>>> # Transforming indices to datetime format
>>> forecasts.index = pd.to_datetime(forecasts.index)
>>>
>>> # Extracting the real prices from the market
>>> _, df_test = read_data(path='.', dataset='NP', begin_test_date=forecasts.index,
...                        end_test_date=forecasts.index[-1])
Test datasets: 2016-12-27 00:00:00 - 2018-12-24 23:00:00
>>>
>>> real_price = df_test.loc[:, ['Price']]
>>>
>>> # Testing the univariate DM version on an ensemble of DNN models versus an ensemble
>>> # of LEAR models
>>> DM(p_real=real_price.values.reshape(-1, 24),
...     p_pred_1=forecasts.loc[:, 'LEAR Ensemble'].values.reshape(-1, 24),
...     p_pred_2=forecasts.loc[:, 'DNN Ensemble'].values.reshape(-1, 24),
...     norm=1, version='univariate')
array([9.99999944e-01, 9.97562415e-01, 8.10333949e-01, 8.85201928e-01,
9.33505978e-01, 8.78116764e-01, 1.70135981e-02, 2.37961920e-04,
5.52337353e-04, 6.07843340e-05, 1.51249750e-03, 1.70415008e-03,
4.22319907e-03, 2.32808010e-03, 3.55958698e-03, 4.80663621e-03,
1.64841032e-04, 4.55829140e-02, 5.86609688e-02, 1.98878375e-03,
1.04045731e-01, 8.71203187e-02, 2.64266732e-01, 4.06676195e-02])
>>>
>>> # Testing the multivariate DM version
>>> DM(p_real=real_price.values.reshape(-1, 24),
...     p_pred_1=forecasts.loc[:, 'LEAR Ensemble'].values.reshape(-1, 24),
...     p_pred_2=forecasts.loc[:, 'DNN Ensemble'].values.reshape(-1, 24),
...     norm=1, version='multivariate')
0.003005725748326471


## plot_multivariate_DM_test¶

The plot_multivariate_DM_test provides an easy-to-use interface to plot in a heat map with a chessboard shape the results of using the DM test to compare the forecasts of multiple models. An example of the heat map is provided below in the function example.

epftoolbox.evaluation.plot_multivariate_DM_test(real_price, forecasts, norm=1, title='DM test', savefig=False, path='')[source]

Plotting the results of comparing forecasts using the multivariate DM test.

The resulting plot is a heat map in a chessboard shape. It represents the p-value of the null hypothesis of the forecast in the y-axis being significantly more accurate than the forecast in the x-axis. In other words, p-values close to 0 represent cases where the forecast in the x-axis is significantly more accurate than the forecast in the y-axis.

Parameters: real_price (pandas.DataFrame) – Dataframe that contains the real prices forecasts (TYPE) – Dataframe that contains the forecasts of different models. The column names are the forecast/model names. The number of datapoints should equal the number of datapoints in real_price. norm (int, optional) – Norm used to compute the loss differential series. At the moment, this value must either be 1 (for the norm-1) or 2 (for the norm-2). title (str, optional) – Title of the generated plot savefig (bool, optional) – Boolean that selects whether the figure should be saved in the current folder path (str, optional) – Path to save the figure. Only necessary when savefig=True

Example

>>> from epftoolbox.evaluation import DM, plot_multivariate_DM_test
>>> from epftoolbox.data import read_data
>>> import pandas as pd
>>>
>>> # Generating forecasts of multiple models
>>>
>>> # Download available forecast of the NP market available in the library repository
>>> # These forecasts accompany the original paper
>>> forecasts = pd.read_csv('https://raw.githubusercontent.com/jeslago/epftoolbox/master/' +
...                       'forecasts/Forecasts_NP_DNN_LEAR_ensembles.csv', index_col=0)
>>>
>>> # Deleting the real price field as it the actual real price and not a forecast
>>> del forecasts['Real price']
>>>
>>> # Transforming indices to datetime format
>>> forecasts.index = pd.to_datetime(forecasts.index)
>>>
>>> # Extracting the real prices from the market
>>> _, df_test = read_data(path='.', dataset='NP', begin_test_date=forecasts.index,
...                        end_test_date=forecasts.index[-1])
Test datasets: 2016-12-27 00:00:00 - 2018-12-24 23:00:00
>>>
>>> real_price = df_test.loc[:, ['Price']]
>>>
>>> # Generating a plot to compare the models using the multivariate DM test
>>> plot_multivariate_DM_test(real_price=real_price, forecasts=forecasts) 