LSTM model to recommend next best offer


We have a use-case where, for every customer, they have a sequence of products, arranged from oldest product taken to newest. A customer can take one or multiple products at different periods. We want to create a model that’s trained on a sequence of products, and is able to ‘predict’/recommend what a customer’s next best offer/product. So basically a next best offer/next best basket. These products can range from taking a personal loan to being recommended travel insurance.

It was suggested using a LSTM model but I had a few questions/concerns regarding this.

  • Can the input be of different lengths? If so, and my input will be a sequence of products of varying lengths, would the model somehow know which product is the most recent based on position only? We might weigh more recent products (and maybe the time between two products) more. We have the date when a customer takes the product. How would I introduce the date into the input, and somehow align each product to its date?

  • I assume I have to apply encodings on my products, so we’ll assign a number (or float?) for each product. So the sequence of products will be a sequence of arrays?

The data will basically be a table with customer ID and all products he took as with the date.

Any direction or references would be appreciated. Please feel free to suggest how you would approach this. Still very new to this.


I’m curious which overall approach others will recommend, but as for the encoding of products, you can probably use tf.keras.layers.StringLookup if the product IDs are currently strings and you want to map them to integers.

Thanks for the reply.

For the StringLookup layer, it requires a rectangular python sequence, so I’m assuming I would have to pad the sequences first, correct? Which requires I convert to integers first, then apply the padding.

The thing is, converting my products into integers and introducing padding (so 0), the model now sees 0 as part of the series, and can predict 0. Is there a way to apply padding and at the same time ignore the 0s?

My product sequences look something like this. The products in bold are what I want to predict.

[A, B, A, C],
[B, C, **B**],
[A, B, B, D],
[D, C, A, A, B, C, C, A]

What I did so far:

tokenizer = tf.keras.preprocessing.text.Tokenizer()

encoded_products = tokenizer.texts_to_sequences(prod_series)

padded_X = tf.keras.preprocessing.sequence.pad_sequences(X, maxlen=MAX_LEN, padding='post')
reshaped_padded_X = np.reshape(padded_X, (len(padded_X), 16, 1))

y_cat = tf.keras.utils.to_categorical(y)

input_layer = tf.keras.Input(shape=(reshaped_padded_X.shape[1], 1))
lstm = tf.keras.layers.LSTM(64, return_sequences=True)(input_layer)
dropout = tf.keras.layers.Dropout(0.2)(lstm)
lstm = tf.keras.layers.LSTM(32)(dropout)
dropout = tf.keras.layers.Dropout(0.2)(lstm)
dense = tf.keras.layers.Dense(16, activation='relu')(dropout)
output_layer = tf.keras.layers.Dense(y_cat.shape[1], activation='softmax')(dense)
model = tf.keras.models.Model(inputs=input_layer, outputs=output_layer)

A few things:

  • y_cat.shape[1] gives me 6. I converted it into categorical so 1 is 010000, 2 is 00100, etc. So in the index of the 1 in that categorical arrays is the actual target/label. However, like mentioned in the previous post, when adding zeros in the padded arrays for the sequences, it trained on 0 - 6, right? But the labels don’t have a zero. Is that an issue where we train on zeros but can’t predict on that? Not sure I’m making sense here.

  • When I want to test data, do I have to use the same tokenizer I defined and fit here and convert to sequences, pad them, and reshape them?

  • When I predict on a few test data, the results are floats, 6 per array. So I’m assuming the first probability is the zero, which will always be negligible because we never had that as a label (from first point). Are the floats just the probability of each label, so we take the highest probability’s index (argmax)?

Just bumping this thread in case anyone can help.

A few updates: our input is now a sequence of products and the time gap between consecutive products.

This is how it looks: [[A, 0], [B, 6], [B, 12], [C, 18], [A, 12]] --> [B, 18]

Basically we want to input the sequence and time gap, and get an output of the next product in the sequence and when they might take it. The time gaps are just predefined time buckets (5) so I guess they can be treated as categorical here.

How would I have multiple inputs (separate inputs or time as a second feature) and multiple outputs (sequence and time gap)?