Sharding in Parameter Server Strategy

Hello,
I need to understand how parameter server strategy is distributing the dataset (of tfrecords) through the workers, so I made a script to run it with a profiler. However, I realized that every worker was reading the full dataset… I think they are processing different data since the execution time does decrease when I add more workers, but in that case shouldn’t they only read the data they are going to use?
I have tried sharding using:

  options = tf.data.Options()
  options.experimental_distribute.auto_shard_policy = tf.data.experimental.AutoShardPolicy.AUTO
  dataset = dataset.with_options(options)

and

dataset = dataset.shard(
    input_context.num_input_pipelines, input_context.input_pipeline_id)

But can’t get them to only read/prefetch what they need to train the model…
Is the way I’m doing this not right? And if so what should I do so that each worker doesn’t read the entire dataset?