# How to implement inverting Gradients [PDQN,MPDQN] in Tensorflow 2.7

I am trying to reimplement inverting gradients with gradienttape in tensorflow 2.7. In this example i use the pendulum domain which has an observation size of 3, action size of 1 and no discrete actions.
as shown in this paper: https://arxiv.org/pdf/1511.04143.pdf

Someone solved it for tensorflow 1.0 here: python - How to implement inverting gradient in Tensorflow? - Stack Overflow

But i am strugglingin reimplementing it for tensorflow 2.0

As far as i understand we need the derivative of dQ(s,a(w,s)): (dq/da)*(da/dw) with w beeing the weights of the policy network.
This is needed to update the weights of the policy network.
So we can access the derivative dq/da via:

``````dq_das = tf.Variable(tape.gradient(loss, actions))
``````

Now we can caluculate the inverting gradients. Shape fits with actions and there is no problem here in calculating:

``````    upper=1
lower=-1

for i in range(dq_das.shape[0]):
dq_da = dq_das[i]
action = actions[i]
if dq_da >= 0.0:
dq_das[i].assign(dq_da * (upper - action) / (upper - lower))
else:
dq_das[i].assign((dq_da * (action - lower) / (upper - lower)))
``````

The derivative da_dw we can access via:

``````da_dw = tape.gradient(actions, self.policy_Net.trainable_variables)
``````

The problem now is that the shapes don’t fit. If i want to calculate dq_da*da_dw.

For dq_da i get:
<tf.Variable ‘Variable:0’ shape=(124, 1) dtype=float32>

which makes sense since the batchsize is 124 and there is one action. And for da_dw i get:

``````[<tf.Tensor 'gradient_tape/policy__network/dense/MatMul_1:0' shape=(3, 400) dtype=float32>, <tf.Tensor 'gradient_tape/policy__network/dense/BiasAdd/BiasAddGrad_1:0' shape=(400,) dtype=float32>, <tf.Tensor 'gradient_tape/policy__network/dense_1/MatMul_3:0' shape=(403, 300) dtype=float32>, <tf.Tensor 'gradient_tape/policy__network/dense_1/BiasAdd/BiasAddGrad_1:0' shape=(300,) dtype=float32>, <tf.Tensor 'gradient_tape/policy__network/dense_2/MatMul_3:0' shape=(703, 1) dtype=float32>, <tf.Tensor 'gradient_tape/policy__network/dense_2/BiasAdd/BiasAddGrad_1:0' shape=(1,) dtype=float32>]
``````

Where is my mistake? Thanks a lot!
My Code looks like this so far:

``````@tf.function

#states = tf.Variable(states)

with tf.GradientTape(persistent=True) as tape:
actions = self.policy_Net(states)
q,_,_ = self.value_Net(states,actions)
loss = -tf.reduce_sum(q,axis=1,keepdims=True)
loss = tf.math.reduce_mean(loss)

dq_das = tf.Variable(tape.gradient(loss, actions))
da_dw = tape.gradient(actions, self.policy_Net.trainable_variables)

upper=1
lower=-1

for i in range(dq_das.shape[0]):
dq_da = dq_das[i]
action = actions[i]
if dq_da >= 0.0:
dq_das[i].assign(dq_da * (upper - action) / (upper - lower))
else:
dq_das[i].assign((dq_da * (action - lower) / (upper - lower)))

print(dq_das)
print(da_dw)
print(dq_das*da_dw)
exit()

return 0``````