According to my expectations `map_fn`

should execute some function in parallel. Unfortunately, it seems that it executes it sequentially. Here is a brief example I created to verify my concerns.

```
@tf.function
def test_mapfn():
@tf.function
def body(vals):
res = tf.constant(1, dtype=tf.float32)
for i in tf.range(5, dtype=tf.float32):
tf.print(i)
# Doing some random calculation below. Not important.
res = res + tf.pow(tf.reduce_sum(tf.where(vals> 0.5, vals, tf.constant(0, dtype=tf.float32))) + tf.pow(tf.reduce_sum(vals), 0.5) - tf.pow(tf.reduce_sum(vals), tf.constant(2, dtype=tf.float32)), -i)
return res
tensor = tf.random.uniform(shape=(2, 100000000))
res = tf.map_fn(body, tensor, parallel_iterations=2)
```

The printed values are

```
0
1
2
3
4
0
1
2
3
4
```

If executed in parallel the code above should mix the printed values in the loop but they are printed sequentially. I know that this is pretty basic test but I noticed the same behavior with functions which take longer time to execute. The CPU is not utilized completely and seems that the function is called sequentially with unrolled inputs. I know the recommendation to use `vectorized_map`

but unfortunately in my use case it is not applicable since the code can’t be rewritten to comply with the requirements of `vectorized_map`

. Also, changing the number of `parallel_iterations`

doesn’t seem to affect the speed.