I’d like to program an environment that has a changing action_spec, related to a game that swaps between 2 alternating phases. The decision to be made is quite different. How would one go about that?
Some additional info and my own thoughts on that:
- observation returns 0 or 1 to signify the phase, and in phase 1 the dice result
- if phase is 0, action 0 is “pass and take your share”, action 1 is “go on”
- if phase is 1, action 0-13 is interpreted as a corresponding action in that game
I could easily treat action 2-13 as “no-action” in phase 0, or all of 1-13 as “go on”. But I expect that would make convergence much slower and more unlikely, since the DQN would have to learn these additional unnecessary relations.
I realized now, my question is not necessary for this game. I can just roll the dice and present them as observation, ask for 0-13 possible actions to do with them, having deterministic consequences AND as 15th action, if there is “pass” or “go on” afterwards.
Nevertheless, there definitely are games out there which have different, (repeating) phases with stochastic elements between them like dice, or interaction of other players, so the actions for different phases cannot simply be put together in one vector like that. So the question remains to be answered, even though I can continue my project now.