Tips for estimating the RAM usage of a tf.data.Dataset object

Asmail · March 31, 2023, 4:31am

Hi

I have a generator function that yields a grayscale image with dimensions (640,360,1) as x and the center point of an object in the image and its rotation as y. When I create the dataset as defined below and fit it to the model I end up with an out-of-RAM crash.

Code:

# create_monkey_images is the generator 

BATCH_SIZE = 32
STEPS_PER_EPOCH = 219

df = tf.data.Dataset.from_generator( 
    create_monkey_images,
    output_signature=(tensorSpec_x,tensorSpec_y)
) \ 
.cache() \
.shuffle(BATCH_SIZE*STEPS_PER_EPOCH) \
.batch(BATCH_SIZE,num_parallel_calls=tf.data.AUTOTUNE) \
.prefetch(BATCH_SIZE) 

model.fit(create_monkey_images(),steps_per_epoch=STEPS_PER_EPOCH,epochs=10)

I also tried to cache it to a file using cache(filename="cache") but it still crashed.

How do you estimate the RAM usage of a tf.data.Dataset object to avoid RAM crashes?

I tried to estimate the RAM usage using basic assumptions that prefetch, batch, and shuffle operations create deep copies of the images as the worst-case scenario. I also disregarded Y since it is much smaller than X. However my estimation is completely wrong.

tf.data.Dataset RAM usage calculation:

ONE_IMAGE_SIZE = 640 * 360 = 230400 B = 230400 B = 230 KB
ONE_EPOCH_SIZE = (BATCH_SIZE*STEPS_PER_EPOCH) * ONE_IMAGE_SIZE = (32 * 219) * 230 KB = 1614 MB

TOTAL_DATASET_SIZE = (CACHE + SHUFFLE + BATCH + PREFETCH) * ONE_EPOCH_SIZE = (1+1+1+1) * 1614 MB = 4 * 1614 MB = 6456 MB.

So in the worst case, the dataset should at most take 7 GB. However, when I run my code in Kaggle using the GPU P100 accelerator environment which comes with 13 GB RAM, I end up crashing when fitting the model. The initial RAM usage of the environment is just 500 MB.

Extra information, the model:

model = Sequential()

model.add(Convolution2D(32, (3,3), padding='same', use_bias=False, input_shape=(640,360,1)))
model.add(LeakyReLU(alpha = 0.1))
model.add(BatchNormalization())

model.add(Convolution2D(32, (3,3), padding='same', use_bias=False))
model.add(LeakyReLU(alpha = 0.1))
model.add(BatchNormalization())
model.add(MaxPool2D(pool_size=(2, 2)))

model.add(Convolution2D(64, (3,3), padding='same', use_bias=False))
model.add(LeakyReLU(alpha = 0.1))
model.add(BatchNormalization())

model.add(Convolution2D(64, (3,3), padding='same', use_bias=False))
model.add(LeakyReLU(alpha = 0.1))
model.add(BatchNormalization())
model.add(MaxPool2D(pool_size=(2, 2)))

model.add(Convolution2D(96, (3,3), padding='same', use_bias=False))
model.add(LeakyReLU(alpha = 0.1))
model.add(BatchNormalization())

model.add(Convolution2D(96, (3,3), padding='same', use_bias=False))
model.add(LeakyReLU(alpha = 0.1))
model.add(BatchNormalization())
model.add(MaxPool2D(pool_size=(2, 2)))

model.add(Convolution2D(128, (3,3),padding='same', use_bias=False))
# model.add(BatchNormalization())
model.add(LeakyReLU(alpha = 0.1))
model.add(BatchNormalization())

model.add(Convolution2D(128, (3,3),padding='same', use_bias=False))
model.add(LeakyReLU(alpha = 0.1))
model.add(BatchNormalization())
model.add(MaxPool2D(pool_size=(2, 2)))

model.add(Convolution2D(256, (3,3),padding='same',use_bias=False))
model.add(LeakyReLU(alpha = 0.1))
model.add(BatchNormalization())

model.add(Convolution2D(256, (3,3),padding='same',use_bias=False))
model.add(LeakyReLU(alpha = 0.1))
model.add(BatchNormalization())
model.add(MaxPool2D(pool_size=(2, 2)))

model.add(Convolution2D(512, (3,3), padding='same', use_bias=False))
model.add(LeakyReLU(alpha = 0.1))
model.add(BatchNormalization())

model.add(Convolution2D(512, (3,3), padding='same', use_bias=False))
model.add(LeakyReLU(alpha = 0.1))
model.add(BatchNormalization())


model.add(Flatten())
model.add(Dense(1,activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(3))
model.add(Reshape((3,1)))
model.summary()


model.compile(optimizer='adam', 
              loss='mean_squared_error',
              metrics=['mae'])

Extra information, the generator:

def create_monkey_images():
    ... uninteresting code 
    while True:
	    ... uninteresting code 
	    x = ... # x.shape = (640,360,1)
	    y = ... # y.shape = (3,)
        
        yield x,y

Kiran_Sai_Ramineni · April 4, 2023, 11:28am

Hi @Asmail, You can use tf.config.experimental.get_memory_info( device ) to get the memory info for the chosen device. In the out put we get the current and peak

current: The current memory used by the device, in bytes.
peak: The peak memory used by the device across the run of the program, in bytes.

For example,

print("Memory usage before computation:")
print(tf.config.experimental.get_memory_info('GPU:0'))

a = tf.constant([[1.0, 2.0, 3.0, 7.0], [4.0, 5.0, 6.0, 8.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0,8.0]])
c = tf.matmul(a, b)

print("Memory usage after computation:")
print(tf.config.experimental.get_memory_info('GPU:0'))

output:
Memory usage before computation:
{'current': 2048, 'peak': 2048}
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0

Memory usage after computation:
{'current': 2816, 'peak': 2816}

Thank You.