Hi

I’m trying to do sensitivity analysis (forward mode autodiff) on a matrix and was hoping to parallelise the computations using Tensorflow. Here’s the code that I’m using to test if something like this is possible in TF:

```
def _forward(X, dX, W1, W2):
Z1 = tf.matmul(X, tf.transpose(W1))
dZ1 = tf.matmul(dX, tf.transpose(W1))
A1 = tf.tanh(Z1)
dA1 = tf.multiply(tf.expand_dims(1-tf.square(tf.tanh(Z1)), axis=1), dZ1)
Z2 = tf.matmul(A1, tf.transpose(W2))
dZ2 = tf.matmul(dA1, tf.transpose(W2))
return Z2, tf.squeeze(dZ2, axis=-1)
```

In the code above, the evaluation of `Z1`

and `dZ1`

is independent of each other (same thing for `A1`

, `dA1`

, etc. etc.) so I was hoping to run these pair of statements in parallel. I wrapped this function around `tf.function`

and was hoping for speedup as compared to standard way of computing gradients (forward and backprop) because now I’ll be running half the calculations in parallel. However, I don’t see any speedups and both codes take the same time to execute.

I don’t know if it’s possible to do what I’m trying here. Any help would be appreciated.

Thanks