Dataset extraction¶
This module provides an easy-to-use interface to download data from multiple day-ahead electricity markets using the following database. The module is built around the function read_data
, and it can be used to obtain the market data from the following periods and day-ahead electricity markets:
Market | Period |
---|---|
Nord pool | 01.01.2013 – 24.12.2018 |
PJM | 01.01.2013 – 24.12.2018 |
EPEX-France | 09.01.2011 – 31.12.2016 |
EPEX-Belgium | 09.01.2011 – 31.12.2016 |
EPEX-Germany | 09.01.2012 – 31.12.2017 |
Besides the data from these five markets, the module also provides an interface to read csv files from other markets and transform their data to match the naming requirements of the prediction models in the epftoolbox library. In addition, it also implements an automatic training/testing split based on the testing period under study.
-
epftoolbox.data.
read_data
(path, dataset='PJM', years_test=2, begin_test_date=None, end_test_date=None)[source]¶ Function to read and import data from day-ahead electricity markets.
It receives a
dataset
name, and thepath
of the folder where datasets are saved. It reads the filedataset.csv
in thepath
directory and provides a split between training and testing dataset based on the test dates provided.It also names the columns of the training and testing dataset to match the requirements of the prediction models of the library. Namely, assuming that there are N exogenous inputs, the columns of the resulting training and testing dataframes are named
['Price', 'Exogenous 1', 'Exogenous 2', ...., 'Exogenous N']
.If dataset is either
"PJM"
,"NP"
,"BE"
,"FR"
, or"DE"
, the function checks whetherdataset.csv
exists inpath
. If it doesn’t exist, it downloads the data from an online database and saves it under thepath
directory."PJM"
refes to the Pennsylvania-New Jersey-Maryland market,"NP"
to the Nord Pool market, and"BE"
,"FR"
, and"DE"
respectively to the EPEX-Belgium, EPEX-France, and EPEX-Germany day-ahead markets.Note that the data available online for these five markets is limited to certain periods (see the database for further details).
Parameters: - path (str, optional) – Path where the datasets are stored or, if they do not exist yet, the path where the datasets are to be stored
- nlayers (int, optional) – Number of hidden layers in the neural network
- dataset (str, optional) – Name of the dataset/market under study. If it is one one of the standard markets,
i.e.
"PJM"
,"NP"
,"BE"
,"FR"
, or"DE"
, the dataset is automatically downloaded. If the name is different, a dataset with a csv format should be place in thepath
. - years_test (int, optional) – Number of years (a year is 364 days) in the test dataset. It is only used if the arguments begin_test_date and end_test_date are not provided.
- begin_test_date (datetime/str, optional) – Optional parameter to select the test dataset. Used in combination with the argument
end_test_date
. If either of them is not provided, the test dataset is built using theyears_test
argument.begin_test_date
should either be a string with the following format"%d/%m/%Y %H:%M"
, or a datetime object. - end_test_date (datetime/str, optional) – Optional parameter to select the test dataset. Used in combination with the argument
begin_test_date
. If either of them is not provided, the test dataset is built using theyears_test
argument.end_test_date
should either be a string with the following format"%d/%m/%Y %H:%M"
, or a datetime object.
Returns: Training dataset, testing dataset
Return type: pandas.DataFrame, pandas.DataFrame
Example
>>> from epftoolbox.data import read_data >>> df_train, df_test = read_data(path='.', dataset='PJM', begin_test_date='01-01-2016', ... end_test_date='01-02-2016') Test datasets: 2016-01-01 00:00:00 - 2016-02-01 23:00:00 >>> df_train.tail() Price Exogenous 1 Exogenous 2 Date 2015-12-31 19:00:00 29.513832 100700.0 13015.0 2015-12-31 20:00:00 28.440134 99832.0 12858.0 2015-12-31 21:00:00 26.701700 97033.0 12626.0 2015-12-31 22:00:00 23.262253 92022.0 12176.0 2015-12-31 23:00:00 22.262431 86295.0 11434.0 >>> df_test.head() Price Exogenous 1 Exogenous 2 Date 2016-01-01 00:00:00 20.341321 76840.0 10406.0 2016-01-01 01:00:00 19.462741 74819.0 10075.0 2016-01-01 02:00:00 17.172706 73182.0 9795.0 2016-01-01 03:00:00 16.963876 72300.0 9632.0 2016-01-01 04:00:00 17.403722 72535.0 9566.0 >>> df_test.tail() Price Exogenous 1 Exogenous 2 Date 2016-02-01 19:00:00 28.056729 99400.0 12680.0 2016-02-01 20:00:00 26.916456 97553.0 12495.0 2016-02-01 21:00:00 24.041505 93983.0 12267.0 2016-02-01 22:00:00 22.044896 88535.0 11747.0 2016-02-01 23:00:00 20.593339 82900.0 10974.0