How to use GPU Memory Oversubscription in TensorFlow 2 (Keras API)?

billpeng · January 10, 2023, 8:22am

Hi, guys, I’m using Keras to train my model, as model itself is large and GPU only gets 16 GiB.

How can I oversubscribe GPU memory in Keras?
I tried the following but it doesn’t work.

for physical_gpu in physical_gpus:
    print("Memory growth for {} before: {}".format(physical_gpu,
                                                   tf.config.experimental.get_memory_growth(physical_gpu)))
    tf.config.experimental.set_memory_growth(physical_gpu, False)
    print("Memory growth for {} after: {}".format(physical_gpu,
                                                  tf.config.experimental.get_memory_growth(physical_gpu)))
    print("Logical device configuration for {} before: {}".format(
        physical_gpu, tf.config.get_logical_device_configuration(physical_gpu))
    )
    tf.config.set_logical_device_configuration(
        physical_gpu, [tf.config.LogicalDeviceConfiguration(memory_limit=30*1024)]
    )
    print("Logical device configuration for {} after: {}".format(
        physical_gpu, tf.config.get_logical_device_configuration(physical_gpu))
    )

Renu_Patel · January 10, 2024, 6:13pm

Hi @billpeng

Welcome to the TensorFlow Forum!

You can try by managing the model size and memory usage by limiting GPU memory growth
tf.config.experimental.set_memory_growth(gpu, True) which allows TensorFlow to allocate only necessary GPU memory, preventing excessive pre-allocation or can try with the lower batch_size and use fewer layers and parameters to optimize the model size.

There is one more method available considering Distributed Strategy which will use multiple GPUs to distribute memory load and enables training larger models. Thank you.