Support for contextual bandits with different number of actions to choose from on each time-step

I am new working with contextual bandits with TensorFlow, I wanted to ask if TensorFlow supports changing continious actions. In case of my application actions can be between 2 to 100 for different time-steps. Currently we are using vowel-wabbit, which does not provide a lot of flexibility and we want to port it to TensorFlow.