Using LSTM layer causes accuracy to stay at 50% in text classification

I’m doing text classification on the built-in IMBD dataset. A network with a dense layer and averaging over the temporal dimension with layers.GlobalAveragePooling1D does fairly well, but if I switch out the averaging with an LSTM layer (which has the same input and output dimensions, so the rest of the network can remain unchanged) the accuracy remains very close to 50%! I’m really confused as to why this could be. Below is code that anyone should be able to run. If you have any idea as to why this is happening please share!

import numpy as np
import tensorflow as tf 
from tensorflow.keras import layers, losses 
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
import tensorflow_datasets as tfds

train_data, test_data = tfds.load(name="imdb_reviews", split=('train', 'test'), batch_size=-1, as_supervised=True)

xTrain, yTrain = tfds.as_numpy(train_data)
xTest, yTest = tfds.as_numpy(test_data)

max_features = 5000  
max_len = 500  
vectorize_layer = TextVectorization(max_tokens=max_features, output_mode='int', output_sequence_length=max_len)
xTrainV = vectorize_layer(xTrain)
xTestV = vectorize_layer(xTest)

model = tf.keras.Sequential([
  layers.Embedding(max_features+1, 16),

              metrics=tf.metrics.BinaryAccuracy(threshold=0)),yTrain,validation_data=(xTestV, yTest), epochs=10)

The LSTM and GlobalAveragePooling1D layers can be switched between freely. An example progression of validation accuracies with GlobalAveragePooling1D over 10 epochs is:
0.65, 0.71, 0.67, 0.76, 0.77, 0.79, 0.8, 0.8, 0.81, 0.83.

If the LSTM layer is used instead, both the validation accuracies as well as the training accuracies are all either 0.49, 0.50, or 0.51, after every epoch. There is no improvement and the network appears to predicting randomly. Why?

Check out this tutorial on text classification with RNN: Text classification with an RNN  |  TensorFlow

Not to be rude but did you actually read my question and look at the code, or are you just linking that tutorial because you read LSTM in the title? Again I’m honestly not trying to be rude, I’m wondering what exactly you think the tutorial can help with. I have read that tutorial and every single tutorial regarding text analysis on the tensorflow website. My code should resemble those of the tutorials pretty closely. My question still stands. I hope I didn’t offend you, please let me know if you have an idea as to the problem I posed.

Have you tried increasing the number of units in your LTSM layer? I’m not sure how long the sequences are in this dataset but 8 seems like a low amount of units to ‘remember’ anything about the previous sequence step.

I would suggest to add at least one dense layer with activation just before the final classification layer, as was recommended in the tutorial. It should improve the model.