How (40, 56) can be input while model has 128 LSTM units?

In overall, I understand following code from Keras Tutorial:

from tensorflow import keras
from tensorflow.keras import layers

import numpy as np
import random
import io

path = keras.utils.get_file(
    "nietzsche.txt", origin="https://s3.amazonaws.com/text-datasets/nietzsche.txt"
)
with io.open(path, encoding="utf-8") as f:
    text = f.read().lower()
text = text.replace("\n", " ")  # We remove newlines chars for nicer display
print("Corpus length:", len(text))

chars = sorted(list(set(text)))
print("Total chars:", len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

# cut the text in semi-redundant sequences of maxlen characters
maxlen = 40
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i : i + maxlen])
    next_chars.append(text[i + maxlen])
print("Number of sequences:", len(sentences))

x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1


model = keras.Sequential(
    [
        keras.Input(shape=(maxlen, len(chars))),
        layers.LSTM(128),
        layers.Dense(len(chars), activation="softmax"),
    ]
)
optimizer = keras.optimizers.RMSprop(learning_rate=0.01)
model.compile(loss="categorical_crossentropy", optimizer=optimizer)

def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype("float64")
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


epochs = 40
batch_size = 128

for epoch in range(epochs):
    model.fit(x, y, batch_size=batch_size, epochs=1)
    print()
    print("Generating text after epoch: %d" % epoch)

    start_index = random.randint(0, len(text) - maxlen - 1)
    for diversity in [0.2, 0.5, 1.0, 1.2]:
        print("...Diversity:", diversity)

        generated = ""
        sentence = text[start_index : start_index + maxlen]
        print('...Generating with seed: "' + sentence + '"')

        for i in range(400):
            x_pred = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_indices[char]] = 1.0
            preds = model.predict(x_pred, verbose=0)[0]
            next_index = sample(preds, diversity)
            next_char = indices_char[next_index]
            sentence = sentence[1:] + next_char
            generated += next_char

        print("...Generated: ", generated)
        print()

# max_len is 40
# number of character is 56

We download the dataset and create x, y for training, and do training while also doing sampling.
When doing sampling, we are running the model 400 times to generate 400 characters.

Here’s one thing I can’t understand. As I learned if there are 128 LSTM units, the output from the first unit got passed as input of second unit, then the output from the second unit got passed into input of third unit and so on. In this way isn’t that model should be generating 127 characters at a time.?

But code here looks like work some what in a different way. Shape of the input is (max_len, number of characters) and output (number of character, ) which got passed to softmax layer.

Can anyone help me to understand this? Thanks

Hi @Seungjun_Lee

Welcome to the TensorFlow Forum!

LSTM is one of the type of Recurrent Neural Network (RNN) to process the sequential data.

The given input_shape in LSTM model will be considered as 40 timesteps and 56 features with n number of batches because it takes input_shape in [batch, time, features]. Whereas LSTM layer’s 128 units are the neurons of the model which determines the complexity and capacity of the LSTM layer to compute the data.

You can pass this input_shape in the LSTM layer by changing the less or more neurons of units in LSTM layer or also can add more LSTM layers.

        tf.kers.layers.LSTM(128, input_shape=(maxlen, len(chars))),

It is always recommended to start with smaller number of LSTM units then increase it gradually as per model performance. You need to mention the correct number of classes/labels of dataset in the final Dense layer for the model to classify.