Hello, I am pretty new to TF-Agents and feel confused about the used metrics in the replay_buffer, dynamic driver and the agent training.
I would really appreatiate if someone can give me a brief explanation about the following terms:
I have created 4 environments and put them all together in a
BatchedPyEnvironment and then converted it to a
TFPyEnvironment. so the batch_size of this environment is 4.
Then I created a
TFUniformReplayBuffer , so what are the 1.batch_zise and 2.max_length ? I understand the batch size is how many elements is stored in the batch, but when I change the max_length value , nothing really happens, unless it is 1 it gives an error.
my observation is a (1,25) vector of integers.
Then to read the replay buffer, I create a dataset outside the training loop ,through
replay_buffer.as_dataset , should this dataset be created at each training iteration? , also what is the difference between the batch_size in the
TFUniformReplayBuffer and the 3.sample_batch_size ?
when I change the num_steps from 2 to 1, it also give an error, so what does this 4.num_steps mean?
then I create an
iterator = iter(dataset) also outside the training loop.
Also when I see the 5.number of episodes, and 6.number of steps by
num_episodes.result().numpy() after the training loop, it shows different numbers that I can’t control every time I run the training.
I create a step driver to collect experience inside the loop
dynamic_step_driver.DynamicStepDriver , but I’m also not sure I got what is the 7.num_steps in it really representing, the training loop I can control the 8.num_iterations , but I though this will be the same number of episodes I get from
num_episodes.result().numpy() and the number of steps I get from
env_steps.result().numpy() , would be the same as
dynamic_step_driver.DynamicStepDriver , but they are all different.
Any hints will be very helpful, Thanks!