What is the difference between tf.data.Dataset and Data generators with Keras

I am handling variable length data. Sometimes the input length is excessively large. I am actually searching for how I should handle the GPU memory. One of the solutions is a custom data generator with Keras . You can find this here in the link: https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly.

Second, I went through tf.data. Dataset [tf.data.Dataset  |  TensorFlow Core v2.8.0] (tf.data.Dataset  |  TensorFlow Core v2.8.0). Now, I am not sure what the difference between these two. And which one is good for handling large data.

Take a look at this answer, It might answer few part of your query.

1 Like

It means we can take advantage of tf.data instead of using the custom generator, right?

Yes. The tf.data API should run faster than custom data generator. ( Though it doesn’t mean that the custom data generator becomes redundant; you can still use it if you want. )

1 Like