Training loss not decreasing enough even after increasing the the model size

granth_jain · March 5, 2023, 8:06am

I am not able to get my training loss decrease on timeseries data even after increasing the tensorflow model size.Can someone please help to let me know what techniques I can apply for decreasing the training loss.

I have done below things in preprocessing:-

Resampling
Scaling

I have tried values of size variable [8,16,32,64] Below is my code:-

import os
os.chdir(os.path.dirname(os.path.abspath(file)))
import pandas as pd
import traceback
import numpy as np
from sklearn.preprocessing import StandardScaler
from pickle import load, dump
import tensorflow as tf
from imblearn.under_sampling import RandomUnderSampler
from tensorflow.keras.layers import LSTM, Dense
from sklearn.metrics import precision_recall_curve
import matplotlib.pyplot as plt
from keras.layers import Conv1D, BatchNormalization, GlobalAveragePooling1D, Permute, Dropout
from keras.layers import Input, Bidirectional, CuDNNLSTM, concatenate, Activation
from keras.models import Model, load_model
from tensorflow.keras.callbacks import ModelCheckpoint
ep=20
ip = Input(shape=(X_train.shape[1:]))
x = CuDNNLSTM(size, return_sequences=False)(ip)
y = Permute((2, 1))(ip)
y = Conv1D(size, 8, padding=‘same’, kernel_initializer=‘he_uniform’)(y)
y = BatchNormalization()(y)
y = Activation(‘relu’)(y)
y = Conv1D(size*2, 5, padding=‘same’, kernel_initializer=‘he_uniform’)(y)
y = BatchNormalization()(y)
y = Activation(‘relu’)(y)
y = Conv1D(size, 3, padding=‘same’, kernel_initializer=‘he_uniform’)(y)
y = BatchNormalization()(y)
y = Activation(‘relu’)(y)
y = GlobalAveragePooling1D()(y)
x = concatenate([x, y])
out = Dense(1, activation=‘sigmoid’)(x)
model = Model(ip, out)
model.compile(optimizer=‘adam’,
loss=tf.keras.losses.BinaryCrossentropy())
es = tf.keras.callbacks.EarlyStopping(monitor=‘loss’, patience=3)
mc = ModelCheckpoint(“lstm_cnn_resample_”+str(size)+“_”+str(idx)+“.h5”, monitor=‘val_loss’, mode=‘min’, save_best_only=True)
model.fit(X_train, y_train, epochs=ep, batch_size=256, validation_data=(X_test, y_test),callbacks=[mc, es])

Can someone please help to let me know what can I do to have less training loss.
Test loss is around training loss only.

Epoch 2/20
15355/15355 [==============================] - 274s 18ms/step - loss: 0.5384 - val_loss: 0.5737
Epoch 3/20
15355/15355 [==============================] - 274s 18ms/step - loss: 0.5363 - val_loss: 0.5407
Epoch 4/20
15355/15355 [==============================] - 270s 18ms/step - loss: 0.5351 - val_loss: 0.5592
Epoch 5/20
15355/15355 [==============================] - 278s 18ms/step - loss: 0.5343 - val_loss: 0.5519
Epoch 6/20
15355/15355 [==============================] - 291s 19ms/step - loss: 0.5335 - val_loss: 0.5540
Epoch 7/20
15355/15355 [==============================] - 382s 25ms/step - loss: 0.5331 - val_loss: 0.5734
Epoch 8/20
15355/15355 [==============================] - 479s 31ms/step - loss: 0.5327 - val_loss: 0.5495
Epoch 9/20
15355/15355 [==============================] - 432s 28ms/step - loss: 0.5323 - val_loss: 0.5369
Epoch 10/20
15355/15355 [==============================] - 234s 15ms/step - loss: 0.5319 - val_loss: 0.5354
Epoch 11/20
15355/15355 [==============================] - 245s 16ms/step - loss: 0.5316 - val_loss: 0.5340
Epoch 12/20
15355/15355 [==============================] - 276s 18ms/step - loss: 0.5313 - val_loss: 0.5501
Epoch 13/20
15355/15355 [==============================] - 293s 19ms/step - loss: 0.5311 - val_loss: 0.5364
Epoch 14/20
15355/15355 [==============================] - 287s 19ms/step - loss: 0.5308 - val_loss: 0.5518
Epoch 15/20
15355/15355 [==============================] - 266s 17ms/step - loss: 0.5306 - val_loss: 0.5488
Epoch 16/20
15355/15355 [==============================] - 281s 18ms/step - loss: 0.5304 - val_loss: 0.5515
Epoch 17/20
15355/15355 [==============================] - 261s 17ms/step - loss: 0.5302 - val_loss: 0.5446
Epoch 18/20
15355/15355 [==============================] - 344s 22ms/step - loss: 0.5301 - val_loss: 0.5375
Epoch 19/20
15355/15355 [==============================] - 267s 17ms/step - loss: 0.5299 - val_loss: 0.5204
Epoch 20/20
15355/15355 [==============================] - 256s 17ms/step - loss: 0.5297 - val_loss: 0.5351

Tanya · March 8, 2023, 7:26pm

hi @granth_jain ,

Welcome back to the TF Forum !

If nothing helped, it’s now the time to start fiddling with hyperparameters.

try different optimizers: SGD trains slower, but it leads to a lower generalization error, while Adam trains faster, but the test loss stalls to a higher value
try decreasing the batch size
increase the learning rate initially, and then decay it, or use a cyclic learning rate
add layers
add hidden units
remove regularization gradually (maybe switch batch norm for a few layers). The training loss should now decrease, but the test loss may increase.

Let us know if above workaround reduces the training loss.

ThankYou !