Hello,
i implemented TD3 and the agent works in the Pendulum-v1 environment. However if i use normalization layers inbetween the agent doesn’t work at all. What might be the reason?
Here is the code of the call function of the critic:
def call(self,state,parameters):
state_ = tf.convert_to_tensor(state, dtype=tf.float32)
parameters_ = tf.convert_to_tensor(parameters, dtype=tf.float32)
state_ = tf.concat([state_,parameters_],1,name = "concatene_State_Parameters")
if self.use_Skip_Layers:
x = state_
x_ = state_
for i in range(len(self.network_Layers)-1):
x = self.network_Layers[i](x)
x = tf.concat([x,x_],axis=1)
if self.normalize:
x = self.norm_Layers[i](x,training=True)
x_ = x
q = self.network_Layers[len(self.network_Layers)-1](x)
else:
x = state_
x_ = state_
for i in range(len(self.network_Layers)-1):
x = self.network_Layers[i](x)
if self.normalize:
x = self.norm_Layers[i](x,training=True)
q = self.network_Layers[len(self.network_Layers)-1](x)
max_Action = tf.math.argmax(q,axis=1)
max_Q_Val = tf.math.reduce_max(q,axis=1)
max_Action=tf.reshape(max_Action,[max_Action.shape[0],1])
max_Q_Val=tf.reshape(max_Q_Val,[max_Action.shape[0],1])
return q,max_Action,max_Q_Val
The actor has the same code except of a squashing function at the end and NoisyDense layers. Also can normalization be used if priorized experience replay is implemented?