I’m doing text classification on the built-in IMBD dataset. A network with a dense layer and averaging over the temporal dimension with
layers.GlobalAveragePooling1D does fairly well, but if I switch out the averaging with an LSTM layer (which has the same input and output dimensions, so the rest of the network can remain unchanged) the accuracy remains very close to 50%! I’m really confused as to why this could be. Below is code that anyone should be able to run. If you have any idea as to why this is happening please share!
import numpy as np import tensorflow as tf from tensorflow.keras import layers, losses from tensorflow.keras.layers.experimental.preprocessing import TextVectorization import tensorflow_datasets as tfds train_data, test_data = tfds.load(name="imdb_reviews", split=('train', 'test'), batch_size=-1, as_supervised=True) xTrain, yTrain = tfds.as_numpy(train_data) xTest, yTest = tfds.as_numpy(test_data) max_features = 5000 max_len = 500 vectorize_layer = TextVectorization(max_tokens=max_features, output_mode='int', output_sequence_length=max_len) vectorize_layer.adapt(xTrain) xTrainV = vectorize_layer(xTrain) xTestV = vectorize_layer(xTest) model = tf.keras.Sequential([ layers.Embedding(max_features+1, 16), layers.GlobalAveragePooling1D(), #layers.LSTM(8), layers.Dense(1)]) model.compile(loss=losses.BinaryCrossentropy(from_logits=True), optimizer='adam', metrics=tf.metrics.BinaryAccuracy(threshold=0)) model.fit(xTrainV,yTrain,validation_data=(xTestV, yTest), epochs=10)
GlobalAveragePooling1D layers can be switched between freely. An example progression of validation accuracies with
GlobalAveragePooling1D over 10 epochs is:
0.65, 0.71, 0.67, 0.76, 0.77, 0.79, 0.8, 0.8, 0.81, 0.83.
If the LSTM layer is used instead, both the validation accuracies as well as the training accuracies are all either 0.49, 0.50, or 0.51, after every epoch. There is no improvement and the network appears to predicting randomly. Why?