Massive performance impact with "run_eagerly=True"

Yaume · July 1, 2021, 4:40pm

Hi,

I work on a model based on this tutorial:

I want to make this vae an annealed beta-vae, so I have defined a “tf.keras.backend.variable” which is updated every epoch in a custom callback. This variable is then applied as a factor to the latent loss in the train_step function.

My first concern is that if I’m not forcing eager mode, the value of the variable is never updated in the train_step() function.
My second concern is that if I force eager mode in .compile() with run_eagerly=True, the value is now correctly updated in the train_step() function but the impact on runtime is HUGE : it’s twice the time for each epoch.

Do you have any idea of what is going on here ?

Thanks

Yaume · July 4, 2021, 8:51pm

I’ve tested a lot of things, still no luck :-/ Nobody to give me an insight ?

SmacznaKawusia · July 5, 2021, 11:05am

Twice the time is really nice. For me, is more like ten times.

https://tensorflow-prod.ospodiscourse.com/t/eager-execution-most-of-the-ops-are-placed-on-host-instead-of-device/2536

Yaume · July 5, 2021, 3:08pm

Finally, I found the solution.

In the train_step() function, my custom variable must be converted to a tensor with tf.convert_to_tensor(my_variable). And now, the variable is correctly updated at each callback, even in non eager mode !