Memory Management in TFDS

Goktug_Guvercin · April 21, 2023, 2:25am

Hello Tensorflow’s Community;

While I was using TFDS module, I was confused about its memory management. I have the following small code block:

train_ds = tfds.load("cifar10", split="train")
test_ds = tfds.load("cifar10", split="test")

train_ds = train_ds.repeat(num_epochs).shuffle(1024)
train_ds = train_ds.batch(batch_size, drop_remainder=True).prefetch(1)

for sample in tfds.as_numpy(train_ds):
    image, label = sample['image'], sample['label']
    print(image.shape)

When we call tfds.load(.) function, we create a builder, download the data, prepare it, and return it as tf.data.Dataset as far as I know. What I am wondering is whether the samples (images and labels) are also loaded into RAM when we use tfds.load() ? If not in the RAM now, when will be it loaded into RAM ? Is it loaded during batching and prefetching or during iteration ?

chunduriv · April 21, 2023, 7:34am

@Goktug_Guvercin,

Welcome to the Tensorflow Forum!

No, the samples (images and labels) are not loaded into memory during tfds.load(). Actually there are loaded into memory during iteration of the train_ds.

The prefetch() method is used to preload batches of data while the model is processing the current batch.

Thank you!