Proper way to reinitialize dataset

Dung_Anh_Hoang · June 28, 2021, 3:12am

What is the proper way to reinitialize tf.Dataset with initializable iterator? I tried many ways and always results in memory leak. Should we use gc.collect or tf.reset_graph_default? How do I use tf.reset_graph_default if it always results in error “AssertionError: Do not use tf.reset_default_graph() to clear nested graphs. If you need a cleared graph, exit the nesting and create a new graph”? I just want to change train dataset and continue training with new data

Bhack · June 28, 2021, 10:19am

What is your specific use case? Why do you need to reset the iterator?

Dung_Anh_Hoang · June 28, 2021, 11:29am

I’m in a project that requires me to update the training data regularly, that’s why I need to reset the iterator each time I update the dataset. I tried to use tf.placeholder as feedict to update the training set, but the memory usage increase as training progress (not so serious as long as it’s till affordable) but sometimes at some training step the code stop when it reinitialize the iterator (weird that at training step before that it still work fine despite increase memory usage), not only that the memory usage increase much faster than before. On my local machine, I don’t see this problem, but it arise once I send my code to a remote gpu cluster to train. I try to use tcmalloc but not sure if the system load it correctly, as there is no change, it still get OOM after a while

Bhack · June 28, 2021, 11:40am

Can you use the generator?

tf.data.Dataset | TensorFlow Core v2.8.0

Dung_Anh_Hoang · June 28, 2021, 3:34pm

I can, but somehow the dataset seems to cache data of previous run, despite I don’t use .cache() at all

Bhack · June 28, 2021, 4:04pm

Do you have a minimal runnable example with a dummy generator?

Dung_Anh_Hoang · June 28, 2021, 4:37pm

ah sorry, it doesn’t look like cache now. I change around 1000 sample, immediately after change data I run the model with new data, and around 127 sample has different labels. I guess it maybe due to prefetch or something similar that the pipeline get some of old sample into its buffer, but it still remain the same after I remove prefetch. Is there anyway to clean buffer of old dataset in pipeline?

For a runnable example, it’s a bit hard since it looks like I need to upload the whole model to see the problem I’m talking about.

Bhack · June 28, 2021, 5:02pm

Check the dataset and dataset optimization options:

https://www.tensorflow.org/api_docs/python/tf/data/Options