I am looking for a way to serialize/deserialize a tf.data.Dataset
object in a way that captures the state without computing the pipeline.
A straightforward way to serialize a tf.data.Dataset
would be to call the save
method, then derserialize with load
, but saving like this is not exactly serializing. Calling save
forces a compute so any map/filter/etc. methods in the pipeline are called. I’d like to be able to store the state of the Dataset
to disk so another process can load it later and have the state identical to the state at the time of serialization.
Maybe iterator checkpointing is the approach I should try?
I found this github issue on the topic but no solution to the original issue.
Thanks for any help.
Dennis