Optimizing seq2seq decoding script

Hello everyone,

I am a beginner in Keras/TensorFlow. I am fitting a seq2seq Transformer model and I made my model based on Chollet’s book “Deep Learning with Python” where he shows seq2seq model on an example of machine translation model. At one point (page 362 in the second ed.), he shows how to make predictions and how to decode the actual integer sequence that is predicted into strings. Here is his code:

import numpy as np
spa_vocab = target_vectorization.get_vocabulary()
spa_index_lookup = dict(zip(range(len(spa_vocab)), spa_vocab))
max_decoded_sentence_length = 20
def decode_sequence(input_sentence):
    tokenized_input_sentence = source_vectorization([input_sentence])
    decoded_sentence = "[start]"
    for i in range(max_decoded_sentence_length):
        tokenized_target_sentence = target_vectorization([decoded_sentence])[:, :-1]
        predictions = transformer(
                                    [tokenized_input_sentence, tokenized_target_sentence])
        sampled_token_index = np.argmax(predictions[0, i, :])
        sampled_token = spa_index_lookup[sampled_token_index]
        decoded_sentence += " " + sampled_token
        if sampled_token == "[end]":
    return decoded_sentence

The problem is that it’s a loop and it takes quite long if I want to do it for a even a small number of test data. I guess there is a way to pass the test data as a whole and it will be much faster. But I have problem understanding, what should tokenized_target_sentence be replaced with? In his example, it makes sense that we start with [start] token and predict the next one and do so until [end] token gets predicted but there should be a way to make it work faster. Could you hint me how to do the same thing in keras without having this loop?

Thank you very much for your help!