CategoricalDQN_agent not working with a Mask?

Smaguy · May 29, 2023, 2:43am

Hello, I’m currently training a RL agent with tf_agents in order to play a card game. As it worked well with a DQN agent, I tried to improve my results by using a categorical DQN agent.
This card game includes valid/invalid actions, so I provided a mask to the agent.

My observation spec is like this :(and so is my input_tensor_spec of my categorical_q_network)
{'observation': BoundedTensorSpec(shape=(85,), dtype=tf.int32, name='observation', minimum=array(-100), maximum=array(100)), 'valid_actions': TensorSpec(shape=(91,), dtype=tf.bool, name='valid_actions')}

However, when the agent wants to start training and take the first timestep as an input, distribution method of “categorical_q_policy.py” is called and in particular this block of code :

network_observation = time_step.observation
    observation_and_action_constraint_splitter = (
        self.observation_and_action_constraint_splitter)

    if observation_and_action_constraint_splitter is not None:
      network_observation, mask = (
          observation_and_action_constraint_splitter(network_observation))


    q_logits, policy_state = self._q_network(
        network_observation, step_type=time_step.step_type,
        network_state=policy_state)

As i have a mask, I provide an observation_and_action_constraint_splitter, and the spec of my network_observation after is the if condition is now like this :

tf.Tensor(
[[  1   1   1  30  30   5   4   0   0   0   1   0   0   0   0   0   0   1
    0   0   7   9   5   7   7   7   2   3   2   7   7   7   2   3   2 -99
  -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99
  -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99
  -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99]], shape=(1, 85), dtype=int32)

The problem is, when the logits are trying to be instantiated, there is a structure error : inputs (network_observation) is now a single tensor while input_tensor_spec expects a dict with {‘observation’: …, ‘valid_actions’:…} , raising the following error :

ValueError: <tf_agents.networks.categorical_q_network.CategoricalQNetwork object at 0x0000021D34EE2D90>: `inputs` and `input_tensor_spec` do not have matching structures:
  .
vs.
  {'observation': ., 'valid_actions': .}
Values:
  [[  1   1   1  30  30   5   4   0   0   0   1   0   0   0   0   0   0   1
    0   0   7   9   5   7   7   7   2   3   2   7   7   7   2   3   2 -99
  -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99
  -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99
  -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99]]
vs.
  {'observation': BoundedTensorSpec(shape=(85,), dtype=tf.int32, name='observation', minimum=array(-100), maximum=array(100)), 'valid_actions': TensorSpec(shape=(91,), dtype=tf.bool, name='valid_actions')}.

I’m not that experienced in RL, so I don’t want to imply that there is an issue in the source code, but I don’t really get how to get rid of this error since categorical_q_policy.py purposely remove a dimension.

Thank you in advance for your help !