Tensorflow: map_fn and parallel execution

According to my expectations map_fn should execute some function in parallel. Unfortunately, it seems that it executes it sequentially. Here is a brief example I created to verify my concerns.

@tf.function
def test_mapfn():
    @tf.function
    def body(vals):
        res = tf.constant(1, dtype=tf.float32)
        for i in tf.range(5, dtype=tf.float32):
            tf.print(i)
            # Doing some random calculation below. Not important.
            res = res + tf.pow(tf.reduce_sum(tf.where(vals> 0.5, vals, tf.constant(0, dtype=tf.float32))) + tf.pow(tf.reduce_sum(vals), 0.5) - tf.pow(tf.reduce_sum(vals), tf.constant(2, dtype=tf.float32)), -i)
        return res

    tensor = tf.random.uniform(shape=(2, 100000000))
    res = tf.map_fn(body, tensor, parallel_iterations=2)

The printed values are

0
1
2
3
4
0
1
2
3
4

If executed in parallel the code above should mix the printed values in the loop but they are printed sequentially. I know that this is pretty basic test but I noticed the same behavior with functions which take longer time to execute. The CPU is not utilized completely and seems that the function is called sequentially with unrolled inputs. I know the recommendation to use vectorized_map but unfortunately in my use case it is not applicable since the code can’t be rewritten to comply with the requirements of vectorized_map. Also, changing the number of parallel_iterations doesn’t seem to affect the speed.

1 Like