Goal
What I in the end want to accomplish is something with the functionality from the following:
def test_parallel():
print(“here1”)
time.sleep(5)
print(“here2”)
from multiprocessing import Process
processes = []
for _ in range(3):
p = Process(target=test_parallel, args=args, kwargs=kwargs)
p.start()
processes.append(p)
for process in processes:
process.join()
With as expected output
here1
here1
here1
here2
here2
here2
Issue
Both multiprocessing and joblib seem to have issues in combination with tensorflow therefore I’m looking for a Tensorflow alternative or another solution.
more extensive explanation
I have a few objects of a class with a tensorflow Model as property “self.model”. Assuming I want to perform a certain function, “simulate” on each object, this would result in something like:
using Joblib
raises an error because obj is not pickle-able and in the args
Parallel(n_jobs=-1)(delayed(obj.simulate)(obj, args, kwargs) for obj in list_of_objects)
Using multiprocessing
I’ve obtained deadlock situations when using tensorflow functionalities and using more than a single processor. More extensive explanation of the issue here;
opened 05:20PM - 31 May 23 UTC
type:bug
TF 2.11
<details><summary>Click to expand!</summary>
### Issue Type
Bug
### Have y… ou reproduced the bug with TF nightly?
No
### Source
binary
### Tensorflow Version
v2.11.0-rc2-17-gd5b57ca93e5 2.11.0
### Custom Code
Yes
### OS Platform and Distribution
Linux Ubuntu 20.04
### Mobile device
_No response_
### Python version
3.9
### Bazel version
_No response_
### GCC/Compiler version
_No response_
### CUDA/cuDNN version
_No response_
### GPU model and memory
_No response_
### Current Behaviour?
In a project where I wanted to implement multiprocessing in combination with a Tensorflow function, the processes kept getting stuck. After some debugging I was able to create a minimal example as seen below or as provided as .txt doc.
[minimal_example_MPlock.txt](https://github.com/tensorflow/tensorflow/files/11616923/minimal_example_MPlock.txt)
In words; when I do some random transpose operation in the process there is no problem at all. However afterwards if I use any functionality from tensorflow and repeat the code that used to work, all of a sudden it gets stuck on the last matrix inverse (which is quite a big one, but shouldn't be any problem).
What you can see, and is probably part of the issue, is that each time the function is called there's a bunch of tensorflow warnings. So far in all of my code I could always just ignore them, but to be sure I added them here in the logs aswell.
Thanks in advance!
### Standalone code to reproduce the issue
```shell
from multiprocessing import Process
import numpy as np
import tensorflow as tf
import time
def test_inverse():
Pxk = tf.eye(2)
Pwk = tf.eye(259)
print("here1")
Pxk = tf.transpose(Pxk)
print("here2")
Pwk = np.transpose(Pwk)
print("here3")
Pwk = tf.transpose(Pwk)
print("here4")
processes = []
for _ in range(3):
p = Process(target=test_inverse, args=[], kwargs={})
time.sleep(1)
p.start()
processes.append(p)
for process in processes:
process.join()
### works perfectly fine
import time
time.sleep(5)
print("using ANY tensorflow function")
a = tf.math.add(2,5)
processes = []
for _ in range(3):
p = Process(target=test_inverse, args=[], kwargs={})
p.start()
processes.append(p)
for process in processes:
process.join()
```
### Relevant log output
```shell
2023-05-31 19:20:48.218419: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-31 19:20:48.286362: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-05-31 19:20:48.620505: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-05-31 19:20:48.620539: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-05-31 19:20:48.620543: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-05-31 19:20:50.183101: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2023-05-31 19:20:50.183138: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (cedric-Z590-UD-AC): /proc/driver/nvidia/version does not exist
2023-05-31 19:20:50.183724: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
here1
here2
here3
here4
2023-05-31 19:20:51.221681: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2023-05-31 19:20:51.221727: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (cedric-Z590-UD-AC): /proc/driver/nvidia/version does not exist
2023-05-31 19:20:51.222470: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
here1
here2
here3
here4
2023-05-31 19:20:52.223862: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2023-05-31 19:20:52.223908: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (cedric-Z590-UD-AC): /proc/driver/nvidia/version does not exist
2023-05-31 19:20:52.224710: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
here1
here2
here3
here4
using ANY tensorflow function
2023-05-31 19:20:57.355866: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2023-05-31 19:20:57.355928: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (cedric-Z590-UD-AC): /proc/driver/nvidia/version does not exist
2023-05-31 19:20:57.356675: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
here1
here1
here2
here3
here2
here3
here1
here2
here3
```
</details>
Context / First attempt
I have a workflow where I use tensorflow functions and for example @tf.function functionality to speed up my matrix computations. Now I want to extend this workflow to parallel computation. A small example and (bad) first attempt using the “test_parallel” defined earlier:
strategy = tf.distribute.MirroredStrategy()
idx = [i for i in range(3)]
dataset = tf.data.Dataset.from_tensor_slices(idx)
dist_dataset = strategy.experimental_distribute_dataset(dataset)
with strategy.scope():
for x in dist_dataset:
strategy.run(test_parallel, args=(x, args, kwargs))
However the output here is of the form
here1
here2
here1
here2
here1
here2
instead of
here1
here1
here1
here2
here2
here2
and hence my attempt clearly failed.
Conclusion
In general I want to perform some multiprocessing on tasks utilizing tensorflow functionalities. I was wondering if Tensorflow offers something alike that I somehow looked over. Another solution would be to obtain a working version with joblib or the python multiprocessing libraries.
Thanks in advance!
Cedric