LSTM Example

Last updated: May 8th, 20202020-05-08Project preview
In [1]:
# ignore warnings for better clarity (may not be the best thing to do)...
import warnings
warnings.filterwarnings('ignore')
In [2]:
import tensorflow as tf
import keras
from keras.preprocessing import sequence 
from keras.models import Sequential 
from keras.layers import Dense, Embedding 
from keras.layers import LSTM 
import pandas as pd
import numpy as np
from sklearn import preprocessing
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
Using TensorFlow backend.
In [3]:
print("tensorflow version %s (should be at least 0.12.1)" % tf.__version__)
print("keras version %s (should be at least 2.0.7)" % keras.__version__)
tensorflow version 1.14.0 (should be at least 0.12.1)
keras version 2.2.4 (should be at least 2.0.7)

Data exploration

In [4]:
# load dataset
raw_data = pd.read_csv('./pollution.csv', header=0, index_col=0)
values = raw_data.values

This dataset provides hourly weather conditions and pollution levels for five years at the US embassy in Beijing, China.

The complete feature list in the raw data is as follows:

No: row number

year: year of data in this row

month: month of data in this row

day: day of data in this row

hour: hour of data in this row

pm2.5: PM2.5 concentration i.e. pollution level

DEWP: Dew Point

TEMP: Temperature

PRES: Pressure

cbwd: Combined wind direction

Iws: Cumulated wind speed

Is: Cumulated hours of snow

Ir: Cumulated hours of rain

In [5]:
raw_data.head()
Out[5]:
pollution dew temp press wnd_dir wnd_spd snow rain
date
2010-01-02 00:00:00 129.0 -16 -4.0 1020.0 SE 1.79 0 0
2010-01-02 01:00:00 148.0 -15 -4.0 1020.0 SE 2.68 0 0
2010-01-02 02:00:00 159.0 -11 -5.0 1021.0 SE 3.57 0 0
2010-01-02 03:00:00 181.0 -7 -5.0 1022.0 SE 5.36 1 0
2010-01-02 04:00:00 138.0 -7 -5.0 1022.0 SE 6.25 2 0
In [6]:
plt.figure(figsize=(10, 8))
plt.plot(values[:, 1], lw=1, label=raw_data.columns[1])
plt.legend(fontsize=14)
plt.xlabel('time', fontsize=14)
plt.ylabel('Mesurements', fontsize=14)
plt.tight_layout()
In [7]:
plt.figure(figsize=(10, 8))
plt.plot(values[:, 2], lw=1, label=raw_data.columns[2])
plt.legend(fontsize=14)
plt.xlabel('time', fontsize=14)
plt.ylabel('Mesurements', fontsize=14)
plt.tight_layout()

Questions

Display the pollution level as a function of time.

Use the heatmap function to display the correlation of the data.

Display boxplots of the pollution values as a function of other variables values (This may help to detect useless features for prediction).

In [8]:
##
# TO DO: plot the pollution level
##
In [9]:
import seaborn as sns
corr = raw_data.corr()
##
# TO DO; display heatmap here and the head of the variable corr. 
##
In [10]:
# some boxplots displaying the distribution of a feature within each category.
# This may help to detect useless features for classification... or a feature highly impacted by the category.
raw_data.boxplot('pollution', by=['rain'],figsize=(15,7))
##
# TO DO: display other boxplots as a function of other variables and comment the results
##
Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f5dfc005898>

Pre-processing

Pre-processing from https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/

Design which data to use as inputs and which as outputs.

This function gets rid of NaN values.

It may be skipped during the first read. The only crucial feature is the n_in parameters which states the number of previous time steps to consider in the input data.

The objective of this project is to predict the pollution level $y_t$ at time $t$ from previous observations. Simple recurrent neural networks and Long Short-Term memory will be considered. At each time step the estimate $\hat{y}_t$ is obtained using a hidden state $h_{t-1}$ computed on the fly and input data $x_t$ at time $t$.

The function build_dataset provives the values of $y_t$ in the last column and the values of the input $x_t$ in all the other columns.

If $n_{in} = 1$, $x_t$ is made of all measurements at time $t-1$.

In [11]:
def build_dataset(data, n_in=1, n_out=1, dropnan=True):
    n_vars = 1 if type(data) is list else data.shape[1]
    df = pd.DataFrame(data)
    cols, names = list(), list()
    # input sequence (t-n, ... t-1)
    for i in range(n_in, 0, -1):
        cols.append(df.shift(i))
        names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
    # forecast sequence (t, t+1, ... t+n)
    for i in range(0, n_out):
        cols.append(df.shift(-i))
        if i == 0:
            names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
        else:
            names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
    # put it all together
    agg = pd.concat(cols, axis=1)
    agg.columns = names
    # drop rows with NaN values
    if dropnan:
        agg.dropna(inplace=True)
    return agg
In [12]:
# integer encode direction
encoder = preprocessing.LabelEncoder()
values[:,4] = encoder.fit_transform(values[:,4])
# ensure all data is float
values = values.astype('float32')
# normalize features
scaler = preprocessing.MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values)

# parameter to design how many time steps to consider in the input data
time_lag = 1
processed_data = build_dataset(scaled, time_lag, 1)
processed_data.head()
# drop useless columns
# columns indices to drop depend on time_lag: this works for time_lag = 1
processed_data.drop(processed_data.columns[[9,10,11,12,13,14,15]], axis=1, inplace=True)
processed_data.head()
Out[12]:
var1(t-1) var2(t-1) var3(t-1) var4(t-1) var5(t-1) var6(t-1) var7(t-1) var8(t-1) var1(t)
1 0.129779 0.352941 0.245902 0.527273 0.666667 0.002290 0.000000 0.0 0.148893
2 0.148893 0.367647 0.245902 0.527273 0.666667 0.003811 0.000000 0.0 0.159960
3 0.159960 0.426471 0.229508 0.545454 0.666667 0.005332 0.000000 0.0 0.182093
4 0.182093 0.485294 0.229508 0.563637 0.666667 0.008391 0.037037 0.0 0.138833
5 0.138833 0.485294 0.229508 0.563637 0.666667 0.009912 0.074074 0.0 0.109658
In [13]:
# split into train and test sets
values = processed_data.values

# number of years to use for training
nb_years = 3
n_train_hours = nb_years*365 * 24

train = values[:n_train_hours, :]
test = values[n_train_hours:, :]

# split into input and outputs
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]

# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
In [14]:
print('x_train shape:', train_X.shape)
print('x_test shape:', test_X.shape)
print('y_train shape:', train_y.shape)
print('y_test shape:', test_y.shape)
x_train shape: (26280, 1, 8)
x_test shape: (17519, 1, 8)
y_train shape: (26280,)
y_test shape: (17519,)
In [15]:
print(train_X.shape[0], 'train samples')
print(test_X.shape[0], 'test samples')
26280 train samples
17519 test samples
In [16]:
input_shape = (train_X.shape[1], train_X.shape[2])
input_shape
Out[16]:
(1, 8)

Feed Forward Neural Network

In [17]:
from random import randint
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import activations
import numpy as np
import matplotlib.pyplot as plt

Question

Define a FFNN model with one hidden layer

In [18]:
model_ffnn = Sequential()

model_ffnn.add(Flatten(input_shape = input_shape))
model_ffnn.add(Dense(32, activation='relu'))
model_ffnn.add(Dense(1, activation='linear'))

model_ffnn.compile(
    loss='mae',
    optimizer=keras.optimizers.Adagrad(),
    metrics=['mean_squared_error']
)

model_ffnn.summary()
WARNING: Logging before flag parsing goes to stderr.
W0508 09:14:34.446182 140043009967104 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W0508 09:14:34.467402 140043009967104 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0508 09:14:34.484238 140043009967104 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W0508 09:14:34.566125 140043009967104 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/site-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_1 (Flatten)          (None, 8)                 0         
_________________________________________________________________
dense_1 (Dense)              (None, 32)                288       
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 33        
=================================================================
Total params: 321
Trainable params: 321
Non-trainable params: 0
_________________________________________________________________
In [19]:
batch_size = 64
epochs = 50
history = model_ffnn.fit(train_X, train_y,
                         batch_size=batch_size,
                         epochs=epochs,
                         verbose=1,
                         validation_data=(test_X, test_y))
W0508 09:14:34.780587 140043009967104 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

W0508 09:14:34.787564 140043009967104 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:973: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

Train on 26280 samples, validate on 17519 samples
Epoch 1/50
26280/26280 [==============================] - 1s 38us/step - loss: 0.0246 - mean_squared_error: 0.0019 - val_loss: 0.0156 - val_mean_squared_error: 7.9951e-04
Epoch 2/50
26280/26280 [==============================] - 1s 31us/step - loss: 0.0145 - mean_squared_error: 7.8064e-04 - val_loss: 0.0140 - val_mean_squared_error: 7.3704e-04
Epoch 3/50
26280/26280 [==============================] - 1s 31us/step - loss: 0.0140 - mean_squared_error: 7.6217e-04 - val_loss: 0.0146 - val_mean_squared_error: 7.4917e-04
Epoch 4/50
26280/26280 [==============================] - 1s 33us/step - loss: 0.0138 - mean_squared_error: 7.5577e-04 - val_loss: 0.0135 - val_mean_squared_error: 7.2091e-04
Epoch 5/50
26280/26280 [==============================] - 1s 31us/step - loss: 0.0137 - mean_squared_error: 7.5483e-04 - val_loss: 0.0135 - val_mean_squared_error: 7.1908e-04
Epoch 6/50
26280/26280 [==============================] - 1s 31us/step - loss: 0.0136 - mean_squared_error: 7.5219e-04 - val_loss: 0.0137 - val_mean_squared_error: 7.2079e-04
Epoch 7/50
26280/26280 [==============================] - 1s 33us/step - loss: 0.0135 - mean_squared_error: 7.5138e-04 - val_loss: 0.0134 - val_mean_squared_error: 7.1635e-04
Epoch 8/50
26280/26280 [==============================] - 1s 32us/step - loss: 0.0135 - mean_squared_error: 7.5090e-04 - val_loss: 0.0136 - val_mean_squared_error: 7.1912e-04
Epoch 9/50
26280/26280 [==============================] - 1s 33us/step - loss: 0.0134 - mean_squared_error: 7.5034e-04 - val_loss: 0.0134 - val_mean_squared_error: 7.1492e-04
Epoch 10/50
26280/26280 [==============================] - 1s 33us/step - loss: 0.0134 - mean_squared_error: 7.4959e-04 - val_loss: 0.0134 - val_mean_squared_error: 7.1612e-04
Epoch 11/50
26280/26280 [==============================] - 1s 32us/step - loss: 0.0134 - mean_squared_error: 7.4934e-04 - val_loss: 0.0134 - val_mean_squared_error: 7.2074e-04
Epoch 12/50
26280/26280 [==============================] - 1s 33us/step - loss: 0.0134 - mean_squared_error: 7.4841e-04 - val_loss: 0.0133 - val_mean_squared_error: 7.1455e-04
Epoch 13/50
26280/26280 [==============================] - 1s 32us/step - loss: 0.0134 - mean_squared_error: 7.4869e-04 - val_loss: 0.0133 - val_mean_squared_error: 7.1374e-04
Epoch 14/50
26280/26280 [==============================] - 1s 33us/step - loss: 0.0133 - mean_squared_error: 7.4817e-04 - val_loss: 0.0133 - val_mean_squared_error: 7.1383e-04
Epoch 15/50
26280/26280 [==============================] - 1s 32us/step - loss: 0.0133 - mean_squared_error: 7.4738e-04 - val_loss: 0.0133 - val_mean_squared_error: 7.1309e-04
Epoch 16/50
26280/26280 [==============================] - 1s 33us/step - loss: 0.0133 - mean_squared_error: 7.4781e-04 - val_loss: 0.0133 - val_mean_squared_error: 7.1329e-04
Epoch 17/50
26280/26280 [==============================] - 1s 31us/step - loss: 0.0133 - mean_squared_error: 7.4792e-04 - val_loss: 0.0133 - val_mean_squared_error: 7.1290e-04
Epoch 18/50
26280/26280 [==============================] - 1s 31us/step - loss: 0.0133 - mean_squared_error: 7.4751e-04 - val_loss: 0.0134 - val_mean_squared_error: 7.1540e-04
Epoch 19/50
26280/26280 [==============================] - 1s 31us/step - loss: 0.0133 - mean_squared_error: 7.4677e-04 - val_loss: 0.0132 - val_mean_squared_error: 7.1226e-04
Epoch 20/50
26280/26280 [==============================] - 1s 32us/step - loss: 0.0133 - mean_squared_error: 7.4724e-04 - val_loss: 0.0132 - val_mean_squared_error: 7.1213e-04
Epoch 21/50
26280/26280 [==============================] - 1s 31us/step - loss: 0.0133 - mean_squared_error: 7.4732e-04 - val_loss: 0.0134 - val_mean_squared_error: 7.2124e-04
Epoch 22/50
26280/26280 [==============================] - 1s 33us/step - loss: 0.0133 - mean_squared_error: 7.4734e-04 - val_loss: 0.0132 - val_mean_squared_error: 7.1372e-04
Epoch 23/50
26280/26280 [==============================] - 1s 31us/step - loss: 0.0133 - mean_squared_error: 7.4738e-04 - val_loss: 0.0132 - val_mean_squared_error: 7.1231e-04
Epoch 24/50
26280/26280 [==============================] - 1s 32us/step - loss: 0.0132 - mean_squared_error: 7.4567e-04 - val_loss: 0.0132 - val_mean_squared_error: 7.1247e-04
Epoch 25/50
26280/26280 [==============================] - 1s 32us/step - loss: 0.0133 - mean_squared_error: 7.4709e-04 - val_loss: 0.0132 - val_mean_squared_error: 7.1287e-04
Epoch 26/50
26280/26280 [==============================] - 1s 31us/step - loss: 0.0132 - mean_squared_error: 7.4743e-04 - val_loss: 0.0132 - val_mean_squared_error: 7.1320e-04
Epoch 27/50
26280/26280 [==============================] - 1s 31us/step - loss: 0.0133 - mean_squared_error: 7.4707e-04 - val_loss: 0.0132 - val_mean_squared_error: 7.1186e-04
Epoch 28/50
26280/26280 [==============================] - 1s 32us/step - loss: 0.0132 - mean_squared_error: 7.4637e-04 - val_loss: 0.0133 - val_mean_squared_error: 7.1124e-04
Epoch 29/50
26280/26280 [==============================] - 1s 30us/step - loss: 0.0132 - mean_squared_error: 7.4623e-04 - val_loss: 0.0133 - val_mean_squared_error: 7.1257e-04
Epoch 30/50
26280/26280 [==============================] - 1s 29us/step - loss: 0.0132 - mean_squared_error: 7.4612e-04 - val_loss: 0.0132 - val_mean_squared_error: 7.1172e-04
Epoch 31/50
26280/26280 [==============================] - 1s 30us/step - loss: 0.0132 - mean_squared_error: 7.4598e-04 - val_loss: 0.0132 - val_mean_squared_error: 7.1552e-04
Epoch 32/50
26280/26280 [==============================] - 1s 30us/step - loss: 0.0132 - mean_squared_error: 7.4709e-04 - val_loss: 0.0132 - val_mean_squared_error: 7.1098e-04
Epoch 33/50
26280/26280 [==============================] - 1s 32us/step - loss: 0.0132 - mean_squared_error: 7.4630e-04 - val_loss: 0.0133 - val_mean_squared_error: 7.1147e-04
Epoch 34/50
26280/26280 [==============================] - 1s 30us/step - loss: 0.0132 - mean_squared_error: 7.4561e-04 - val_loss: 0.0132 - val_mean_squared_error: 7.1433e-04
Epoch 35/50
26280/26280 [==============================] - 1s 30us/step - loss: 0.0132 - mean_squared_error: 7.4623e-04 - val_loss: 0.0132 - val_mean_squared_error: 7.1183e-04
Epoch 36/50
26280/26280 [==============================] - 1s 30us/step - loss: 0.0132 - mean_squared_error: 7.4578e-04 - val_loss: 0.0133 - val_mean_squared_error: 7.1527e-04
Epoch 37/50
26280/26280 [==============================] - 1s 30us/step - loss: 0.0132 - mean_squared_error: 7.4571e-04 - val_loss: 0.0133 - val_mean_squared_error: 7.1317e-04
Epoch 38/50
26280/26280 [==============================] - 1s 30us/step - loss: 0.0132 - mean_squared_error: 7.4575e-04 - val_loss: 0.0134 - val_mean_squared_error: 7.1418e-04
Epoch 39/50
26280/26280 [==============================] - 1s 30us/step - loss: 0.0132 - mean_squared_error: 7.4642e-04 - val_loss: 0.0132 - val_mean_squared_error: 7.1078e-04
Epoch 40/50
26280/26280 [==============================] - 1s 30us/step - loss: 0.0132 - mean_squared_error: 7.4568e-04 - val_loss: 0.0133 - val_mean_squared_error: 7.1195e-04
Epoch 41/50
26280/26280 [==============================] - 1s 30us/step - loss: 0.0132 - mean_squared_error: 7.4600e-04 - val_loss: 0.0134 - val_mean_squared_error: 7.1471e-04
Epoch 42/50
26280/26280 [==============================] - 1s 30us/step - loss: 0.0132 - mean_squared_error: 7.4533e-04 - val_loss: 0.0132 - val_mean_squared_error: 7.1205e-04
Epoch 43/50
26280/26280 [==============================] - 1s 31us/step - loss: 0.0132 - mean_squared_error: 7.4534e-04 - val_loss: 0.0132 - val_mean_squared_error: 7.1189e-04
Epoch 44/50
26280/26280 [==============================] - 1s 32us/step - loss: 0.0132 - mean_squared_error: 7.4541e-04 - val_loss: 0.0133 - val_mean_squared_error: 7.1248e-04
Epoch 45/50
26280/26280 [==============================] - 1s 33us/step - loss: 0.0132 - mean_squared_error: 7.4469e-04 - val_loss: 0.0132 - val_mean_squared_error: 7.1227e-04
Epoch 46/50
26280/26280 [==============================] - 1s 32us/step - loss: 0.0132 - mean_squared_error: 7.4596e-04 - val_loss: 0.0132 - val_mean_squared_error: 7.1111e-04
Epoch 47/50
26280/26280 [==============================] - 1s 33us/step - loss: 0.0132 - mean_squared_error: 7.4372e-04 - val_loss: 0.0132 - val_mean_squared_error: 7.1156e-04
Epoch 48/50
26280/26280 [==============================] - 1s 31us/step - loss: 0.0132 - mean_squared_error: 7.4449e-04 - val_loss: 0.0132 - val_mean_squared_error: 7.1205e-04
Epoch 49/50
26280/26280 [==============================] - 1s 31us/step - loss: 0.0132 - mean_squared_error: 7.4473e-04 - val_loss: 0.0131 - val_mean_squared_error: 7.1303e-04
Epoch 50/50
26280/26280 [==============================] - 1s 32us/step - loss: 0.0132 - mean_squared_error: 7.4557e-04 - val_loss: 0.0131 - val_mean_squared_error: 7.1125e-04
In [20]:
plt.figure(figsize=(7, 5))
plt.plot(history.epoch, history.history['loss'], lw=3, label='Training set')
plt.plot(history.epoch, history.history['val_loss'], lw=3, label='Test set')
plt.legend(fontsize=14)
plt.title('Loss of the FFNN', fontsize=16)
plt.xlabel('Epoch', fontsize=14)
plt.ylabel('MAE', fontsize=14)
plt.tight_layout()

Recurrent Neural Networks

At time $t$ the hidden state of the network is computed as follows:

$h_t = \sigma_h(W_x\,{x_t} + W_h\,{h_{t-1}} + b_h)$, where $\sigma_h$ is a nonlinear activation function, e.g. $\mathrm{tanh}$, $x_t$ is the input at time $t$ and $h_{t-1}$ is the hidden state at the previous time step.

$W_x$, $b_h$ and $W_h$ are the unknown parameters of the state update.

The predicted output is:

$\widehat{y}_t = \sigma_y(W_y\,{h_t} + b_y)\,,$

where $\sigma_y$ is the output activation function and $W_y$ and $b_y$ are the unknown parameters of the prediction step.

Question

Define a RNN model with one hidden layer SimpleRNN

In [21]:
from keras.layers.recurrent import SimpleRNN
model_rnn = Sequential()
model_rnn.add(SimpleRNN(32,input_shape=(train_X.shape[1], train_X.shape[2])))  
model_rnn.add(Dense(1, activation='linear'))

model_rnn.compile(loss='mae', optimizer='adam')

model_rnn.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
simple_rnn_1 (SimpleRNN)     (None, 32)                1312      
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 33        
=================================================================
Total params: 1,345
Trainable params: 1,345
Non-trainable params: 0
_________________________________________________________________
In [22]:
batch_size = 64
epochs = 50
history = model_rnn.fit(train_X, train_y, epochs = epochs, batch_size = batch_size, validation_data=(test_X, test_y), verbose=1)
Train on 26280 samples, validate on 17519 samples
Epoch 1/50
26280/26280 [==============================] - 2s 59us/step - loss: 0.0216 - val_loss: 0.0136
Epoch 2/50
26280/26280 [==============================] - 1s 46us/step - loss: 0.0142 - val_loss: 0.0138
Epoch 3/50
26280/26280 [==============================] - 1s 45us/step - loss: 0.0141 - val_loss: 0.0152
Epoch 4/50
26280/26280 [==============================] - 1s 43us/step - loss: 0.0140 - val_loss: 0.0136
Epoch 5/50
26280/26280 [==============================] - 1s 45us/step - loss: 0.0139 - val_loss: 0.0143
Epoch 6/50
26280/26280 [==============================] - 1s 44us/step - loss: 0.0141 - val_loss: 0.0135
Epoch 7/50
26280/26280 [==============================] - 1s 43us/step - loss: 0.0140 - val_loss: 0.0133
Epoch 8/50
26280/26280 [==============================] - 1s 44us/step - loss: 0.0138 - val_loss: 0.0138
Epoch 9/50
26280/26280 [==============================] - 1s 43us/step - loss: 0.0139 - val_loss: 0.0136
Epoch 10/50
26280/26280 [==============================] - 1s 44us/step - loss: 0.0141 - val_loss: 0.0135
Epoch 11/50
26280/26280 [==============================] - 1s 41us/step - loss: 0.0139 - val_loss: 0.0134
Epoch 12/50
26280/26280 [==============================] - 1s 41us/step - loss: 0.0139 - val_loss: 0.0151
Epoch 13/50
26280/26280 [==============================] - 1s 41us/step - loss: 0.0139 - val_loss: 0.0135
Epoch 14/50
26280/26280 [==============================] - 1s 41us/step - loss: 0.0138 - val_loss: 0.0132
Epoch 15/50
26280/26280 [==============================] - 1s 41us/step - loss: 0.0137 - val_loss: 0.0133
Epoch 16/50
26280/26280 [==============================] - 1s 41us/step - loss: 0.0139 - val_loss: 0.0140
Epoch 17/50
26280/26280 [==============================] - 1s 41us/step - loss: 0.0139 - val_loss: 0.0135
Epoch 18/50
26280/26280 [==============================] - 1s 41us/step - loss: 0.0138 - val_loss: 0.0145
Epoch 19/50
26280/26280 [==============================] - 1s 42us/step - loss: 0.0139 - val_loss: 0.0134
Epoch 20/50
26280/26280 [==============================] - 1s 40us/step - loss: 0.0138 - val_loss: 0.0137
Epoch 21/50
26280/26280 [==============================] - 1s 42us/step - loss: 0.0137 - val_loss: 0.0146
Epoch 22/50
26280/26280 [==============================] - 1s 41us/step - loss: 0.0140 - val_loss: 0.0134
Epoch 23/50
26280/26280 [==============================] - 1s 42us/step - loss: 0.0138 - val_loss: 0.0134
Epoch 24/50
26280/26280 [==============================] - 1s 48us/step - loss: 0.0138 - val_loss: 0.0136
Epoch 25/50
26280/26280 [==============================] - 1s 40us/step - loss: 0.0138 - val_loss: 0.0133
Epoch 26/50
26280/26280 [==============================] - 1s 44us/step - loss: 0.0138 - val_loss: 0.0145
Epoch 27/50
26280/26280 [==============================] - 1s 43us/step - loss: 0.0138 - val_loss: 0.0134
Epoch 28/50
26280/26280 [==============================] - 1s 44us/step - loss: 0.0138 - val_loss: 0.0138
Epoch 29/50
26280/26280 [==============================] - 1s 43us/step - loss: 0.0136 - val_loss: 0.0134
Epoch 30/50
26280/26280 [==============================] - 1s 44us/step - loss: 0.0138 - val_loss: 0.0136
Epoch 31/50
26280/26280 [==============================] - 1s 43us/step - loss: 0.0139 - val_loss: 0.0134
Epoch 32/50
26280/26280 [==============================] - 1s 44us/step - loss: 0.0137 - val_loss: 0.0139
Epoch 33/50
26280/26280 [==============================] - 1s 43us/step - loss: 0.0137 - val_loss: 0.0132
Epoch 34/50
26280/26280 [==============================] - 1s 44us/step - loss: 0.0138 - val_loss: 0.0134
Epoch 35/50
26280/26280 [==============================] - 1s 44us/step - loss: 0.0137 - val_loss: 0.0133
Epoch 36/50
26280/26280 [==============================] - 1s 43us/step - loss: 0.0139 - val_loss: 0.0151
Epoch 37/50
26280/26280 [==============================] - 1s 45us/step - loss: 0.0139 - val_loss: 0.0136
Epoch 38/50
26280/26280 [==============================] - 1s 45us/step - loss: 0.0137 - val_loss: 0.0135
Epoch 39/50
26280/26280 [==============================] - 1s 47us/step - loss: 0.0137 - val_loss: 0.0134
Epoch 40/50
26280/26280 [==============================] - 1s 45us/step - loss: 0.0136 - val_loss: 0.0150
Epoch 41/50
26280/26280 [==============================] - 1s 45us/step - loss: 0.0138 - val_loss: 0.0138
Epoch 42/50
26280/26280 [==============================] - 1s 44us/step - loss: 0.0137 - val_loss: 0.0138
Epoch 43/50
26280/26280 [==============================] - 1s 44us/step - loss: 0.0137 - val_loss: 0.0135
Epoch 44/50
26280/26280 [==============================] - 1s 45us/step - loss: 0.0137 - val_loss: 0.0138
Epoch 45/50
26280/26280 [==============================] - 1s 44us/step - loss: 0.0138 - val_loss: 0.0133
Epoch 46/50
26280/26280 [==============================] - 1s 45us/step - loss: 0.0138 - val_loss: 0.0144
Epoch 47/50
26280/26280 [==============================] - 1s 45us/step - loss: 0.0137 - val_loss: 0.0136
Epoch 48/50
26280/26280 [==============================] - 1s 44us/step - loss: 0.0137 - val_loss: 0.0138
Epoch 49/50
26280/26280 [==============================] - 1s 45us/step - loss: 0.0136 - val_loss: 0.0134
Epoch 50/50
26280/26280 [==============================] - 1s 45us/step - loss: 0.0135 - val_loss: 0.0140
In [23]:
plt.figure(figsize=(7, 5))
plt.plot(history.epoch, history.history['loss'], lw=3, label='Training')
plt.plot(history.epoch, history.history['val_loss'], lw=3, label='Testing')
plt.legend(fontsize=14)
plt.title('Loss of the simple RNN', fontsize=16)
plt.xlabel('Epoch', fontsize=14)
plt.ylabel('MAE', fontsize=14)
plt.tight_layout()

Long Short-Term memory

The LSTM cell is a more complex recurrent neural network. It contains three gates, input, forget, output gates and a memory cell i.e. several hidden transformations to process the hidden state and the input.

The first transforms at time $t$ are:

$i_t = \sigma ( W_i [h_{t-1}, x_t] + b_i)$, $f_t = \sigma ( W_f [h_{t-1},x_t] + b_f)$, $o_t = \sigma ( W_o [h_{t-1},x_t] + b_o)$,

where $W_i, W_f, W_o$, $b_i$, $b_f$ and $b_o$ are the unknown parameters (applied to the concatenation of $h_{t-1}$ (hidden state vector) and $x_t$ (input vector)).

The previous hidden state $h_{t-1}$ and the current input $x_t$ are used to compute the a candidate $g_t$:

$g_t = \mathrm{tanh}( W_g [h_{t-1}, x_t] + b_g)\,.$

The cell memory $c_t$, is updated as:

$c_t = c_{t-1} \circ f_t + g_t \circ i_t\,,$

where $c_{t-1}$ is the previous memory, and $\circ$ refers to element-wise multiplication.

The output, $h_t$, is computed as

$h_t = \mathrm{tanh}(c_t) \circ o\,.$

The predicted output is:

$\widehat{y}_t = \sigma_y(W_y\,{h_t} + b_y)\,,$

where $\sigma_y$ is the output activation function and $W_y$ and $b_y$is the unknown parameter of the prediction step.

Question

Define a LSTM model with one hidden layer

In [24]:
model = Sequential()
model.add(LSTM(10, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')
In [25]:
batch_size = 64
epochs = 50
history = model.fit(train_X, train_y, epochs = epochs, batch_size = batch_size, validation_data=(test_X, test_y), verbose=1)
W0508 09:16:16.541272 140043009967104 deprecation.py:323] From /usr/local/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Train on 26280 samples, validate on 17519 samples
Epoch 1/50
26280/26280 [==============================] - 3s 103us/step - loss: 0.0529 - val_loss: 0.0284
Epoch 2/50
26280/26280 [==============================] - 2s 73us/step - loss: 0.0162 - val_loss: 0.0138
Epoch 3/50
26280/26280 [==============================] - 2s 73us/step - loss: 0.0138 - val_loss: 0.0134
Epoch 4/50
26280/26280 [==============================] - 2s 73us/step - loss: 0.0137 - val_loss: 0.0134
Epoch 5/50
26280/26280 [==============================] - 2s 72us/step - loss: 0.0136 - val_loss: 0.0134
Epoch 6/50
26280/26280 [==============================] - 2s 75us/step - loss: 0.0135 - val_loss: 0.0133
Epoch 7/50
26280/26280 [==============================] - 2s 79us/step - loss: 0.0135 - val_loss: 0.0133
Epoch 8/50
26280/26280 [==============================] - 2s 78us/step - loss: 0.0135 - val_loss: 0.0134
Epoch 9/50
26280/26280 [==============================] - 2s 79us/step - loss: 0.0135 - val_loss: 0.0133
Epoch 10/50
26280/26280 [==============================] - 2s 79us/step - loss: 0.0134 - val_loss: 0.0132
Epoch 11/50
26280/26280 [==============================] - 2s 80us/step - loss: 0.0134 - val_loss: 0.0132
Epoch 12/50
26280/26280 [==============================] - 2s 78us/step - loss: 0.0134 - val_loss: 0.0137
Epoch 13/50
26280/26280 [==============================] - 2s 79us/step - loss: 0.0134 - val_loss: 0.0135
Epoch 14/50
26280/26280 [==============================] - 2s 79us/step - loss: 0.0134 - val_loss: 0.0135
Epoch 15/50
26280/26280 [==============================] - 2s 76us/step - loss: 0.0134 - val_loss: 0.0133
Epoch 16/50
26280/26280 [==============================] - 2s 72us/step - loss: 0.0134 - val_loss: 0.0133
Epoch 17/50
26280/26280 [==============================] - 2s 73us/step - loss: 0.0134 - val_loss: 0.0132
Epoch 18/50
26280/26280 [==============================] - 2s 73us/step - loss: 0.0134 - val_loss: 0.0132
Epoch 19/50
26280/26280 [==============================] - 2s 73us/step - loss: 0.0133 - val_loss: 0.0132
Epoch 20/50
26280/26280 [==============================] - 2s 74us/step - loss: 0.0134 - val_loss: 0.0133
Epoch 21/50
26280/26280 [==============================] - 2s 73us/step - loss: 0.0134 - val_loss: 0.0132
Epoch 22/50
26280/26280 [==============================] - 2s 74us/step - loss: 0.0134 - val_loss: 0.0132
Epoch 23/50
26280/26280 [==============================] - 2s 75us/step - loss: 0.0134 - val_loss: 0.0132
Epoch 24/50
26280/26280 [==============================] - 2s 76us/step - loss: 0.0133 - val_loss: 0.0133
Epoch 25/50
26280/26280 [==============================] - 2s 78us/step - loss: 0.0134 - val_loss: 0.0132
Epoch 26/50
26280/26280 [==============================] - 2s 79us/step - loss: 0.0133 - val_loss: 0.0137
Epoch 27/50
26280/26280 [==============================] - 2s 80us/step - loss: 0.0133 - val_loss: 0.0133
Epoch 28/50
26280/26280 [==============================] - 2s 79us/step - loss: 0.0133 - val_loss: 0.0132
Epoch 29/50
26280/26280 [==============================] - 2s 78us/step - loss: 0.0133 - val_loss: 0.0137
Epoch 30/50
26280/26280 [==============================] - 2s 82us/step - loss: 0.0133 - val_loss: 0.0132
Epoch 31/50
26280/26280 [==============================] - 2s 81us/step - loss: 0.0133 - val_loss: 0.0133
Epoch 32/50
26280/26280 [==============================] - 2s 81us/step - loss: 0.0133 - val_loss: 0.0132
Epoch 33/50
26280/26280 [==============================] - 2s 82us/step - loss: 0.0133 - val_loss: 0.0132
Epoch 34/50
26280/26280 [==============================] - 2s 80us/step - loss: 0.0133 - val_loss: 0.0133
Epoch 35/50
26280/26280 [==============================] - 2s 80us/step - loss: 0.0133 - val_loss: 0.0132
Epoch 36/50
26280/26280 [==============================] - 2s 81us/step - loss: 0.0133 - val_loss: 0.0132
Epoch 37/50
26280/26280 [==============================] - 2s 80us/step - loss: 0.0133 - val_loss: 0.0132
Epoch 38/50
26280/26280 [==============================] - 2s 81us/step - loss: 0.0133 - val_loss: 0.0132
Epoch 39/50
26280/26280 [==============================] - 2s 73us/step - loss: 0.0133 - val_loss: 0.0132
Epoch 40/50
26280/26280 [==============================] - 2s 73us/step - loss: 0.0133 - val_loss: 0.0132
Epoch 41/50
26280/26280 [==============================] - 2s 75us/step - loss: 0.0133 - val_loss: 0.0132
Epoch 42/50
26280/26280 [==============================] - 2s 76us/step - loss: 0.0134 - val_loss: 0.0135
Epoch 43/50
26280/26280 [==============================] - 2s 86us/step - loss: 0.0133 - val_loss: 0.0142
Epoch 44/50
26280/26280 [==============================] - 2s 82us/step - loss: 0.0133 - val_loss: 0.0132
Epoch 45/50
26280/26280 [==============================] - 2s 84us/step - loss: 0.0133 - val_loss: 0.0133
Epoch 46/50
26280/26280 [==============================] - 2s 80us/step - loss: 0.0133 - val_loss: 0.0132
Epoch 47/50
26280/26280 [==============================] - 2s 83us/step - loss: 0.0133 - val_loss: 0.0133
Epoch 48/50
26280/26280 [==============================] - 2s 84us/step - loss: 0.0133 - val_loss: 0.0132
Epoch 49/50
26280/26280 [==============================] - 2s 80us/step - loss: 0.0133 - val_loss: 0.0139
Epoch 50/50
26280/26280 [==============================] - 2s 80us/step - loss: 0.0133 - val_loss: 0.0134
In [26]:
plt.figure(figsize=(7, 5))
plt.plot(history.epoch, history.history['loss'], lw=3, label='Training')
plt.plot(history.epoch, history.history['val_loss'], lw=3, label='Testing')
plt.legend(fontsize=14)
plt.title('Loss of the LSTM', fontsize=16)
plt.xlabel('Epoch', fontsize=14)
plt.ylabel('MAE', fontsize=14)
plt.tight_layout()
In [27]:
# make a prediction
yhat = model.predict(test_X)
test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))
# invert scaling for forecast
inv_yhat = np.concatenate((yhat, test_X[:, 1:]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]
# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = np.concatenate((test_y, test_X[:, 1:]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:,0]
# calculate RMSE
rmse = np.sqrt(mean_squared_error(inv_y, inv_yhat))
print('Test RMSE: %.3f' % rmse)
Test RMSE: 26.594

Questions

Use a cross-validation to select the best LSTM and best RNN networks (with respect to the dimension of the hidden state) dimension.

Analyze the impact of the lag in the input sequence (i.e. if $x_t$ contains the features at times $t-1$, $t-2$, $t-3$) by using the function build_dataset with another parameter.

Analyze the sensitiviy of the LSTM with respect to initialization (by training several independent LSTM models).

Use the predict function to predict the observations $y$ associated with the input data in test_X. Display these predictions with the true pollution level in y_test. Compute the associated mean square error.

Question (bonus)

How would you capture the randomness of the prediction instead of producing a unique prediction at each time step ?

Notebooks AI
Notebooks AI Profile20060