Question about how gradients are computed

I have realized that I can take the gradient of a vector w.r.t an input. In other words I could perform:

import numpy as np
import tensorflow as tf

w = tf.Variable( [[1. , 2. ],[ 3., 4.]] , name='w')
x = tf.Variable( [[1., 2.]]             , name='x')

with tf.GradientTape(persistent=True) as tape:
  y = x @ w 
  loss = y 

grad = tape.gradient( loss, [x])

Here y is a vector so I would expect gradient to be a matrix. Since I would be asking to compute the gradient of each coordinate in y wrt each coordinate in the vector ‘x’. In other words I am expecting the Jacobian. What is happening here under the hood? Since I am getting a vector.


Welcome to the Tensorflow Forum!

When taking the gradient of loss with respect to x, the resulting gradient is a 1x2 matrix representing the partial derivatives of each element of loss with respect to each element of x.

To get the full Jacobian matrix of y with respect to both x and w, we can pass both x and w to the gradient method as shown below

grad = tape.gradient(loss, [x, w])

Thank you!