According to my expectations
map_fn should execute some function in parallel. Unfortunately, it seems that it executes it sequentially. Here is a brief example I created to verify my concerns.
@tf.function def test_mapfn(): @tf.function def body(vals): res = tf.constant(1, dtype=tf.float32) for i in tf.range(5, dtype=tf.float32): tf.print(i) # Doing some random calculation below. Not important. res = res + tf.pow(tf.reduce_sum(tf.where(vals> 0.5, vals, tf.constant(0, dtype=tf.float32))) + tf.pow(tf.reduce_sum(vals), 0.5) - tf.pow(tf.reduce_sum(vals), tf.constant(2, dtype=tf.float32)), -i) return res tensor = tf.random.uniform(shape=(2, 100000000)) res = tf.map_fn(body, tensor, parallel_iterations=2)
The printed values are
0 1 2 3 4 0 1 2 3 4
If executed in parallel the code above should mix the printed values in the loop but they are printed sequentially. I know that this is pretty basic test but I noticed the same behavior with functions which take longer time to execute. The CPU is not utilized completely and seems that the function is called sequentially with unrolled inputs. I know the recommendation to use
vectorized_map but unfortunately in my use case it is not applicable since the code can’t be rewritten to comply with the requirements of
vectorized_map. Also, changing the number of
parallel_iterations doesn’t seem to affect the speed.