🧦 🔎 🕢 Machine learning in the energy sector, or not only everyone can watch tomorrow 🙋🏽 🔳 🛌🏽

Accurate prediction of future events is a promising and interesting task in many areas: from weather forecasting to fintech (stock quotes, exchange rates). Machine learning today can significantly reduce the time and labor involved in making management decisions. For about half a year,

our Data Science team at NORBIT experimented using various machine learning models to solve classification and regression problems, and to optimize b2b business processes. But when the task of predicting the time series appeared, it turned out that the available materials on this topic on the network were not enough to develop a quick solution.

The essence of the task was to predict with maximum accuracy (average absolute error in percent, or MAPE <3%) the hourly consumption of the whole city for the next three days (72 hours) for the needs of one large electricity generating company.

The maximum permissible error of 3% is a very high indicator for the accuracy of forecasts. Before the use of machine learning, the error was in the range of 3-23%, which directly led to financial losses, since with insufficient generation of electric energy, additional capacities had to be bought in the wholesale electricity market (which is much more expensive than self-generated), with overproduction - sold in the market cheaper than they could sell to the city. Also, the negative effect of the deviation of the planned load from the actual one is a change in the frequency of the alternating current in the network.

Many factors prevent the achievement of high accuracy of forecasting, for example, with a sudden increase in temperature, people immediately reach for consoles from air conditioners, and when they drop, they get heaters from the mezzanines; during the holidays or big football matches include TVs, etc. While some events are cyclical and potentially predictable, others are completely absent. Suppose, suddenly, due to an accident, the steel shop ceased to work and all forecasts turned into a pumpkin (in our target city there were no large production enterprises). There was a rather interesting example of a problem in making forecasts in the energy sector, when one of the spikes in changes in the amount of electricity generated was caused by the ship stranded on a lake, and the captain agreed with the owners of hydroelectric power plants to reduce water discharge.Naturally, the machine learning model could not predict this event on the basis of historical data.

The good news for us is that the customer provided such “special days” in the terms of reference that allowed an average forecast error of 8%.

All that the customer provided is the historical data of the hourly electricity consumption for 3 years and the name of the city.

The first task for preparing the construction of a machine learning model is visualization and the search for additional factors that can influence the forecast. The degree of such influence is conveniently visually assessed using a heat map of the correlation of attributes. The dark square in this case means the unconditional inverse relationship of the quantities (the larger one value, the smaller the other, and vice versa), white - direct:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt


# df - pandas DataFrame  
sns.heatmap(df.corr());

As far as I know, in the models that are used to forecast electricity consumption in the Russian Federation, 2 factors are used - the history of consumption and the temperature forecast.

In winter and autumn, energy consumption is higher - low temperature and a relatively short daylight hours. The “day of the week” factor also affects the amount of energy spent - on weekends it drops sharply. Peaks in consumption occur on Wednesday or Thursday.

Inside one day on weekdays, the peak values of energy consumption occur at 11-12 hours and gradually fall with a sharp drop after nine in the evening.

Everything is as in the textbooks:

Daily winter and summer daily consumption schedules for industrial and cultural centers. Source

Modeling

Prophet

The first idea how to make the task as fast as possible is to take Prophet from Facebook. I already had experience using it, and I remembered it as “a thing that simply and quickly works out of the box.” Prophet wants to see the columns “ds” and “y” in the pandas data frame, i.e. date in the format YYYY-MM-DD HH: MM: SS, and the target is consumption per hour, respectively (examples of work are in the documentation ).

df = pd.read_pickle('full.pkl')
pjme = df.loc[:'2018-06-01 23:00'].rename(columns={'hour_value': 'y'}) 
pjme['ds'] = pjme.index
split_date = '2018-03-01 23:00'
pjme_train = pjme.loc[pjme.index <= split_date]
pjme_test = pjme.loc[pjme.index > split_date]


playoffs = pd.DataFrame({
  'holiday': 'playoff',
  'ds': df[(df.rest_day == 1)|(df.is_weekend == 1)].index,
  'lower_window': 0,
  'upper_window': 1,
})


model = Prophet(holidays=playoffs)
model.fit(pjme_train.reset_index())
forecast = model.predict(df=pjme_test.reset_index())

The predictions of the “prophet” looked objective, but the error was 7–17% higher than the permissible, therefore, I did not conduct further experiments on this framework.

Sarimaax

The second attempt to solve this problem was using SARIMAX. The method is quite time-consuming for selecting coefficients, but it was possible to reduce the forecast error compared to Prophet to 6-11%.

Unfortunately, only the result for the weekly predictions remained, but it was these predictive data that I used as additional features in the boosting models.

For the forecast, you need to select the SARIMA parameters (p, d, q) (P, D, Q, s):

p is the autoregressive order (AR), which allows you to add the past values of the time series (can be expressed as "tomorrow, it will probably snow if it has snowed in the last few days");
d is the order of integration of the source data (it determines the number of previous time points to be subtracted from the current value);
q is the order of the moving average (MA), which allows you to set the error of the model in the form of a linear combination of the previously observed error values;
P - p, but seasonal;
D - d, but seasonal;
Q - q, but seasonal;
s is the dimension of seasonality (month, quarter, etc.).

On the seasonal expansion of the weekly history, you can see the trend and seasonality, which allows seasonal_decompose to be displayed:

import statsmodels.api as sm
sm.tsa.seasonal_decompose(df_week).plot()

According to the graphs of autocorrelation and partial autocorrelation, he selected the initial approximations of the parameters for the SARIMAX model: p = 0, P = 1, q = 2, Q = 2.

sm.graphics.tsa.plot_acf(df_week.diff_week[52:].values.squeeze(), lags=60, ax=ax)
sm.graphics.tsa.plot_pacf(df_week.diff_week[52:].values.squeeze(), lags=60, ax=ax)

And the final selected values are (0, 2, 2) x (1, 2, 0, 52). As a result, the consumption forecast graph looked like this:

Boosting

At this point, it became apparent that without the use of additional external factors, the necessary accuracy could not be achieved. First of all, these are weather factors: temperature, pressure, humidity, wind strength and direction, precipitation.

The first attempt was an attempt to buy an archive of weather in Yandex. The guys have excellent API and technical support, but specifically in our city there were quite a lot of gaps in the data. As a result, I bought the archive in the openweathermap.org service for $ 10. It is important to note that, despite the fact that the weather is a very useful factor, forecast errors will obviously spoil the final MAPE model. I met information that the correct solution to this problem would be in training to use not the actual historical values of the weather, but weather forecasts for three days, but, unfortunately, I did not have such an opportunity.

I also added signs of the time of day, day of the week, holidays, the serial number of the day in the year, as well as a large number of time lags and average values for previous periods.

All the data after scaling (MinMaxScale) and One Hot Encoding (the values of each categorical item become separate columns with 0 and 1, so the item with three values becomes three different columns, and the unit will be in only one of them) I used in a small competition of three popular boost models XGBoost, LightGBM and CatBoost.

XGBoost

A lot of good has been written about XGBoost. Presented at the SIGKDD conference in 2016, he made a splash and still holds a leading position. Interesting material for those who have not heard about it.

Let's prepare the data for the model:

from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.model_selection import train_test_split


def gen_data(
    stop_day = '2018-06-01 23:00', 
    drop_columns = [],
    test_size=0.15,
    is_dummies=True,
    is_target_scale=False):

    df = pd.read_pickle('full.pkl')
    df = df.loc[:stop_day]
    scaler = MinMaxScaler()
    target_scaler = StandardScaler()
    y = df.hour_total
    
    if is_target_scale:        
        y = target_scaler.fit_transform(y.values.reshape(-1, 1)).reshape(1,-1)[0]
        
    X = df.drop(['hour_value', *drop_columns], axis=1)
    num_columns = X.select_dtypes(include=['float64']).columns
    X[num_columns] = scaler.fit_transform(X[num_columns])
    
    if is_dummies:
        X = pd.get_dummies(X)

    train_count_hours = len(X) - 72
    valid_count_hours = len(X) - int(len(X) * 0.2)
    X_test  = X[train_count_hours:]
    y_test  = y[train_count_hours:]
    X = X[:train_count_hours]
    y = y[:train_count_hours]

    #           ,       
    X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=test_size, random_state=42)
    train_dates = X_train.index
    val_dates = X_test.index
    return X, y, X_train, X_val, y_train, y_val, X_test, y_test, target_scaler

I’m teaching a model using the best previously selected hyperparameters:

X, y, X_train, X_val, y_train, y_val, X_test, y_test, _ = gen_data(
    stop_day = stop_day, 
    drop_columns = drop_columns,
    test_size=0.1,
    is_dump=True,
    is_target_scale=False)


params_last = {
    'base_score': 0.5, 
    'booster': 'gbtree', 
    'colsample_bylevel': 1, 
    'colsample_bynode': 1, 
    'colsample_bytree': 0.4, 
    'gamma': 0,
    'max_depth': 2, 
    'min_child_weight': 5, 
    'reg_alpha': 0, 
    'reg_lambda': 1, 
    'seed': 38,
    'subsample': 0.7, 
    'verbosity': 1,
    'learning_rate':0.01
}


reg = xgb.XGBRegressor(**params_last, n_estimators=2000)
print(X_train.columns)
reg.fit(X_train, y_train,
        eval_set=[(X_val, y_val)],
        early_stopping_rounds=20,
        verbose=False)


y_pred = reg.predict(X_test)

Lightgbm

LightGBM is a framework from Microsoft whose main advantage is the speed of learning on really large data sets. And also, unlike XGBoost, LightGBM can work with categories, uses less memory. Here then there is more.

import lightgbm as lgb
from lightgbm import LGBMRegressor
from sklearn.metrics import mean_squared_error


stop_day = '2018-06-01 23:00'
start_day_for_iterate = datetime.strptime(stop_day, '%Y-%m-%d %H:%M')
test_size = 0.2

X, y, X_train, X_val, y_train, y_val, X_test, y_test, _ = gen_data(
    stop_day = stop_day, 
    drop_columns = drop_columns,
    test_size=test_size,
    is_dump=True,
    drop_weekend=False,
    is_target_scale=False)


model = LGBMRegressor(boosting_type='gbdt', 
                num_leaves=83, 
                max_depth=-1, 
                learning_rate=0.008, 
                n_estimators=15000, 
                max_bin=255, 
                subsample_for_bin=50000, 
                min_split_gain=0, 
                min_child_weight=3,
                min_child_samples=10, 
                subsample=0.3, 
                subsample_freq=1, 
                colsample_bytree=0.5, 
                reg_alpha=0.1, 
                reg_lambda=0, 
                seed=38,
                silent=False, 
                nthread=-1)


history = model.fit(X_train, y_train, 
            eval_metric='rmse',
            eval_set=[(X_val, y_val)],
            early_stopping_rounds=40,
            verbose = 0)


y_pred = model.predict(X_test, num_iteration=model.best_iteration_)

Catboost

CatBoost is an advanced library of gradient boosting on open source decision trees "from Yandex . The advantageous difference of the model is the convenience of working with datasets containing categorical attributes - you can transfer data containing categories to the model without conversion to numbers, and it reveals patterns on its own, by hand nothing needs to be twisted, and the quality of the prediction remains high.

cat_features=np.where(X.dtypes == 'category')[0]
eval_dataset = Pool(X_test, y_test)

model = CatBoostRegressor(learning_rate=0.5,
                          eval_metric='RMSE', 
                          leaf_estimation_iterations=3,
                          depth=3)

model.fit(X_train, y_train,
          eval_set=(X_val, y_val),
          cat_features=cat_features,
          plot=True,
          verbose=True)

y_pred = model.predict(X_test)

XGBoost vs. LightGBM vs. Catboost

In order not to repeat numerous articles, I will give a comparative table from here .

To calculate the MAPE error, I took the last known month (28 days) and, moving the window with a resolution of one hour, I made a forecast for the next 72 hours.

And in my impromptu competition the prizes were:

3rd place: my favorite - XGBoost - with the lowest forecast quality;
2nd place: CatBoost as the most convenient and “all out of the box”;
1st place: LightGBM as the fastest and most accurate.

For a more accurate comparison of the models, I used R2 (R-squared or determination coefficient, showing how much the conditional variance of the model differs from the variance of real values) and RMSLE (Root Mean Squared Logarithmic Error, or the root mean square logarithmic error, which is, in fact, the distance between two points on the plane - real value and predicted).

Metrics	Lightgbm	Catboost	XGBoost	Prophet	Sarimaax
r2	0.94137	0.93984	0.92909	0.81435	0.73778
Msle	0.02468	0.02477	0.01219	0.00829	0.00658

But all this, of course, in relation to my task. On other data in different hands, everything can be completely different.

Next time, I will share another story of our own on forecasting employee outflows for the task of planning recruitment in our company.

By the way, our team is growing, and we are looking for people!

Machine learning in the energy sector, or not only everyone can watch tomorrow