TFRecordDataset auto cache

Dung_Anh_Hoang · June 25, 2021, 11:55am

I have a input pipeline that I need to update input regularly, so I use TFRecordDataset and thought I just need to update the file to update pipeline. However, it looks like the pipeline auto cache the dataset, but I didn’t use cache() method. Can anyone help me point out what making my pipeline automatically cache dataset?
Below is my pipeline:

ds = tf.data.TFRecordDataset(os.path.join(self.data_path,file_name))

ds = ds.map(self.decode_fn(is_train), num_parallel_calls=tf.data.experimental.AUTOTUNE)

options = tf.data.Options()
options.experimental_distribute.auto_shard_policy = (
tf.data.experimental.AutoShardPolicy.OFF)

train_dataflow = ds.with_options(options)

train_ds = train_dataflow.repeat().batch(
self.batch_size, drop_remainder=True
).map(
autoaug_batch_process_map_fn,
num_parallel_calls=tf.data.experimental.AUTOTUNE).prefetch(
buffer_size=tf.data.experimental.AUTOTUNE)

train_input_iterator = (
self.strategy.experimental_distribute_dataset(
train_ds).make_initializable_iterator())

Adrian_Celaya · July 29, 2022, 1:35am

Did you resolve this issue? If so, how? I’m seeing a similar issue with my code…