Behavior of`tf.GradientTape` vs `torch.autograd`!

innat · July 27, 2023, 12:03pm

I am trying to translate a torch implementation in TensorFlow and faced some gradient-level issues while implementing it in TensorFlow. I already asked here for reproducible code.

github.com/tensorflow/tensorflow

ValueError: No gradients provided for any variable

opened 11:35AM - 15 Jul 23 UTC

innat

type:bug comp:ops comp:core TF 2.12

### Issue type Bug ### Have you reproduced the bug with TensorFlow Nightly…? Yes ### Source source ### TensorFlow version tf 2.12 ### Custom code Yes ### OS platform and distribution _No response_ ### Mobile device _No response_ ### Python version _No response_ ### Bazel version _No response_ ### GCC/compiler version _No response_ ### CUDA/cuDNN version _No response_ ### GPU model and memory _No response_ ### Current behavior? I have run a PyTorch code that computes the gradient of the gradient w.r.t some computation. It works just fine. Now, I want to translate PyTorch code into TensorFlow but got some errors. ## Standalone code to reproduce the issue Here is the reproducible code. [Gist](https://colab.research.google.com/drive/1GPhctZNrXynrCQ0qNbLyMDmuixQtC0fw?usp=sharing). The above collab is small and quickly reproduces the run of PyTorch and TensorFlow. PyTorch runs as expected but TensorLow doesn't. Below is the main spot to look at: **Main Part** In PyTorch, ```python rand_model = Rnadom() model = Model() ran_optim = torch.optim.SGD( ran_model.parameters() ) model_params = model.parameters() loss_mod = model.forward(x) loss_rand = model.forward(y) model_grad = torch.autograd.grad(loss_mod, model_params) rand_grad = torch.autograd.grad( loss_rand, model_params, create_graph=True ) loss = some_method(model_grad, rand_grad) rand_model.zero_grad() loss.backward() ran_optim.step() ``` In `pytorch`, the above `create_graph=True` is crucial. In TensorFlow, I tried ```python ran_model = Random() ran_optim = tf.keras.optimizers.SGD() model = Model() model.build(input_shape=(1, 784)) optim = tf.keras.optimizers.SGD(0.01) model_params = model.trainable_variables with tf.GradientTape(persistent=True) as tape: tape.watch(ran_model.trainable_variables) loss_mod = tf.reduce_mean(tf.math.log(model(x)[:, i])) loss_rand = tf.reduce_mean(tf.math.log(model(y)[:, i])) grads_mod = tape.gradient(loss_mod, model_params) grads_rand = tape.gradient(loss_rand, model_params) loss = some_method(model_grad, rand_grad) ran_model_grads = tape.gradient(loss, ran_model.trainable_variables) ran_optim.apply_gradients( zip(ran_model_grads, ran_model.trainable_variables) ) ``` The `tf` code gives the following error. ```yaml --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-3-01562609cda8> in <cell line: 33>() 44 loss += tf.reduce_sum(tf.stack([a, b], axis=0)) 45 ran_model_grads = tape.gradient(loss, ran_model.trainable_variables) ---> 46 ran_optim.apply_gradients(zip(ran_model_grads, ran_model.trainable_variables)) 47 48 3 frames /usr/local/lib/python3.10/dist-packages/keras/optimizers/utils.py in filter_empty_gradients(grads_and_vars) 75 if not filtered: 76 variable = ([v.name for _, v in grads_and_vars],) ---> 77 raise ValueError( 78 f"No gradients provided for any variable: {variable}. " 79 f"Provided `grads_and_vars` is {grads_and_vars}." ValueError: No gradients provided for any variable: (['Variable:0'],). Provided `grads_and_vars` is ((None, <tf.Variable 'Variable:0' shape=(10, 1, 784) dtype=float32, numpy= ``` - This is probably because the `ran_model_grads, ran_model.trainable_variables` are not connected. As mentioned in this [doc](https://www.tensorflow.org/guide/autodiff), > When a **target** is not connected to a **source**, the gradient will return `None` - In PyTorch, `create_graph=True` is used to compute the gradient of the gradient in the later part. To compute [grad-of-grad](https://www.tensorflow.org/guide/advanced_autodiff#example_input_gradient_regularization), but didn't work (shown below). The reason probably is the same as before, source and target are not connected. ```python for i in range(5): with tf.GradientTape() as tape1: loss_mod = tf.reduce_mean(tf.math.log(model(x)[:, i])) grads_mod = tape1.gradient(loss_mod, model_params) with tf.GradientTape() as tape3: with tf.GradientTape() as tape2: loss_rand = tf.reduce_mean(tf.math.log(model(y)[:, i])) grads_rand = tape2.gradient(loss_rand, model_params) loss = 0 for a, b in zip(grads_mod, grads_rand): loss += tf.reduce_sum(tf.stack([a, b], axis=0)) [ISSUE] > ran_model_grads = tape3.gradient(loss, ran_model.trainable_variables) ran_optim.apply_gradients(zip(ran_model_grads, ran_model.trainable_variables)) ``` But in this case, how to resolve this in TensorFlow?

If you have any suggestions or feedback on it, that would be highly appreciated. Thank you.