Call Tensorflow Model in a loop leaks memory

source code
https://keras.io/examples/nlp/neural_machine_translation_with_transformer/

in the decode_sequence function
if i try to call that function in a loop it would leak my memory
transformer([tokenized_input_sentence, tokenized_target_sentence])
gc.collect() not work
tf.keras.backend.clear_session() not work
transformer.predict() not work with gc.collect and tf.keras.backend.clear_session()
transformer(training=False) not work

def decode_sequence(input_sentence):
    tokenized_input_sentence = input_vectorization([input_sentence])
    decoded_sentence = START_TOKEN
    for i in tf.range(max_decoded_sentence_length):
        tokenized_target_sentence = output_vectorization([decoded_sentence])#[:, :-1]
        
        predictions = transformer([tokenized_input_sentence, tokenized_target_sentence])
        
        
        sampled_token_index = np.argmax(predictions[0, i, :])
        sampled_token = output_index_lookup[sampled_token_index]
        decoded_sentence += sampled_token

        if sampled_token == END_TOKEN:
            break
    
   
    gc.collect()
    return decoded_sentence

from tqdm import tqdm
def overall_accuracy(pairs):
    corrects = 0
    inputs = pairs[2739:]
    iter = tqdm(inputs)
    for i, pair in enumerate(iter):
        input_text = pair[0]
        target = pair[1]
        predicted = decode_sequence(input_text)
        #guess = '✓' if predicted == target else '✗'
        #print('Sample Number : ', i, 'Predicted : ', predicted, 'Real : ', target, guess)
        if predicted == target:
            corrects += 1
        iter.set_postfix(corrects=corrects, accuracy=corrects / (i + 1))
    
    return corrects / len(inputs)

print("Overall Acurracy : ", overall_accuracy(test_pairs))```

Hey @ALER_EM,

Memory leaks in TensorFlow can be tricky to diagnose and fix. Here are some general steps and specific suggestions:

1. Use TensorFlow’s Profiler

TensorFlow 2.x has a built-in memory profiler that can help you identify where the memory is being used. This can be a good starting point.

2. Avoid Global Variables

Ensure that there are no global variables that are accumulating data over time. This is a common source of memory leaks.

3. Functional Approach

Instead of using a model object that persists across calls, consider using a functional approach where you create, use, and discard objects within the scope of a function. This ensures that there are no lingering references.

4. Reduce Scope

Ensure that large objects are limited in scope so they can be garbage collected once out of scope.

5. Explicitly Delete Objects

After using large objects, explicitly delete them.

del large_object
gc.collect()

6. Use tf.function

Using tf.function can sometimes help in optimizing the execution and memory usage.

7. Avoid Using Numpy Inside TensorFlow Loop

Converting between TensorFlow tensors and Numpy arrays can be expensive and can lead to memory leaks if not managed properly.

For example, in your decode_sequence function, replace:

sampled_token_index = np.argmax(predictions[0, i, :])

with:

sampled_token_index = tf.argmax(predictions[0, i, :], axis=-1).numpy()

8. Check for Bugs in External Libraries

Sometimes, the memory leak might not be in your code but in the libraries you’re using. Ensure you’re using the latest version of TensorFlow and other libraries, as memory leak bugs might have been fixed in newer versions.

9. Use a Different Environment

Sometimes, specific environments or platforms can have issues. If possible, test your code in a different environment or machine to see if the issue persists.

10. Monitor System Memory

Use tools like htop (on Linux) or Task Manager (on Windows) to monitor the system’s memory usage. This can give you insights into when and how the memory is being consumed.

If after trying the above suggestions the issue persists, consider creating a minimal reproducible example and reporting it as a bug to the TensorFlow team. They might be able to provide more specific guidance or fixes.