I had a question regarding the behavior of tf.gradients() as opposed tf.gradientTape.gradient() in graph mode.
Given a differentiable function y = f(x), where x and y are each single tensorflow tensors, is there any difference between the behavior of tf.gradient(y, x) vs tape.gradient(y, x) where tape is an instance of tf.gradientTape (assuming the use of graph mode) ?
Not sure why tensorflow has two different gradient methods which can be used with graph mode - maybe there are some subtle differences in the implementations? I’ve looked at the documentation for gradientTape and tf.gradients but it’s not clear whether there is any difference between the behavior of these methods for a single (x, y) pair, or whether it’s just that tf.gradients() can be used in this case for a speedup when using graph mode.
Thank you so much for your help!