TF agent 10x slower than python agent

Hello,

I’m trying out TF agents, but I’m getting terrible performance compared to the python equivalents.
Here’s a simple example:

from tf_agents.drivers import dynamic_step_driver
from tf_agents.drivers import py_driver
from tf_agents.environments import suite_gym
from tf_agents.environments import tf_py_environment
from tf_agents.policies import random_py_policy
from tf_agents.policies import random_tf_policy
import time

py_env = suite_gym.load('LunarLander-v2')
tf_env = tf_py_environment.TFPyEnvironment(suite_gym.load('LunarLander-v2'))

py_policy = random_py_policy.RandomPyPolicy(
  py_env.time_step_spec(), py_env.action_spec())
tf_policy = random_tf_policy.RandomTFPolicy(
  tf_env.time_step_spec(), tf_env.action_spec())

pydriver = py_driver.PyDriver(
  py_env,
  py_policy,
  observers=[],
  max_steps=1,
)
tf_driver = dynamic_step_driver.DynamicStepDriver(
 tf_env,
 tf_policy
)

n = 4000

py_step = py_env.reset()
policy_state = py_policy.get_initial_state(1)
start_time = time.time()
for i in range(n):
  py_step, _ = pydriver.run(time_step=py_step)
elapsed = time.time() - start_time
print(f'py step/sec: {n/elapsed}')

tf_step = tf_env.reset()
policy_state = tf_policy.get_initial_state(1)
start_time = time.time()
for i in range(n):
  tf_driver.run()
elapsed = time.time() - start_time
print(f'tf step/sec:  {n/elapsed}')

I get ~3000 step/sec with the python variant but only ~300 with the tensorflow one.
What am I doing wrong?

Thanks!

Hi @lorenzos.
I run your snippet code above several times in a notebook on a PC, CPU only, and Tensorflow version run each time about x5 faster, e.g.

py step/sec: 1707.961050397284
tf step/sec: 257.4732501827857

You are running your code on a GPU and problem stems from configuration/optimization maybe?

Here I’m measuring step/sec, so higher number is faster.
I also tried to to wrap everything in a function annotated with ‘@tf.function’ but that didn’t help either.

Hi lorenzos. Sorry I completely misread. Looking again at it, isn’t the issue due to loop put over the wrong loop? My understanding is that you run the random policy x4000 starting the whole learning course from scratch (the whole program) instead of running 4000 steps of that random policy (and it makes sense looping like this withTensorflow environment is slower).

Based, on this Tensorflow documentation, running 4000 steps of the random policy is faster with the Tensorflow environment (I replaced dynamic_step_driver.DynamicStepDriver with tf_driver.TFDriver to make Python and Tensroflow environment more “similar”) :

Python:

import time
from tf_agents.environments import suite_gym
from tf_agents.drivers import py_driver
from tf_agents.environments import tf_py_environment
from tf_agents.policies import random_py_policy

py_env = suite_gym.load(‘LunarLander-v2’)

py_policy = random_py_policy.RandomPyPolicy(
py_env.time_step_spec(),
py_env.action_spec())

n = 4000

py_driver = py_driver.PyDriver(
py_env,
py_policy,
observers=[],
max_steps=n,
)

py_step = py_env.reset()
policy_state = py_policy.get_initial_state(1)
start_time = time.time()
py_step, _ = py_driver.run(py_step)
elapsed = time.time() - start_time
print(f’py step/sec: {n/elapsed}’)
#py step/sec: 1696.337375997939

Tensorflow:

import time
from tf_agents.environments import suite_gym
from tf_agents.drivers import tf_driver
from tf_agents.environments import tf_environment
from tf_agents.policies import random_tf_policy

tf_env = tf_py_environment.TFPyEnvironment(suite_gym.load(‘LunarLander-v2’))

tf_policy = random_tf_policy.RandomTFPolicy(
tf_env.time_step_spec(),
tf_env.action_spec())

n = 4000

tf_driver = tf_driver.TFDriver(
tf_env,
tf_policy,
observers=[],
max_steps=n,
)

tf_step = tf_env.reset()
policy_state = tf_policy.get_initial_state(1)
start_time = time.time()
tf_step, _ = tf_driver.run(tf_step)
elapsed = time.time() - start_time
print(f’tf step/sec: {n/elapsed}’)
#tf step/sec: 2333.3396381972393