Running optimizer nelder mead in parallel

I want to run tensor flow nelder mead algorithm in parallel. As per the documentation
one can run it in parallel by specifying the arg parallel_iterations=<num_threads>. To test it, I have made the following test case.

import tensorflow as tf
import tensorflow_probability as tfp
import numpy as np
import pathlib as pl
from datetime import datetime
import time

def objec_function(x):
    now =
    current_time = now.strftime("%H:%M:%S")
    print("Current Time =", current_time)
    sum = 0
    for i in x:
        sum += i
    return sum

start = tf.constant([6.0, -21.0])
optim_results = tfp.optimizer.nelder_mead_minimize(objec_function,initial_vertex=start, func_tolerance=1e-8,batch_evaluate_objective=False,parallel_iterations=2)


I want to minimize the test objective function which just returns sum of the list x. It also prints the time at which it is called and sleeps for 10 seconds. I asked the algorithm to run 2 threads in parallel, expecting that I would see two print statement with very little time difference indicating 2 parallel iterations started running. But the output I get is as follows, showing exactly 10 seconds time interval indicating only 1 thread is running not 2.

user@machine:> python3.10 
2022-02-03 15:27:48.833355: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2022-02-03 15:27:48.891642: E tensorflow/stream_executor/cuda/] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2022-02-03 15:27:48.891722: I tensorflow/stream_executor/cuda/] kernel driver does not appear to be running on this host (strand-fe4): /proc/driver/nvidia/version does not exist
2022-02-03 15:27:48.892823: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2022-02-03 15:27:48.945504: I tensorflow/core/platform/profile_utils/] CPU Frequency: 2100000000 Hz
2022-02-03 15:27:48.954983: I tensorflow/compiler/xla/service/] XLA service 0x4c138d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-02-03 15:27:48.955023: I tensorflow/compiler/xla/service/]   StreamExecutor device (0): Host, Default Version
Current Time = 15:27:49
Current Time = 15:27:59
Current Time = 15:28:09
Current Time = 15:28:19
Current Time = 15:28:29
Current Time = 15:28:39
Current Time = 15:28:49
Current Time = 15:28:59
Current Time = 15:29:09
Current Time = 15:29:19

I tried to export OMP_NUM_THREADS=2 but to no avail. Kindly help.