CUDA 12.3 and Python 3.9, with any TF, GPU not found

Is there a way to get the GPU’s to work? Python 3.11 works flawlessly. This is on RHEL 9.


[GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] on linux

Type "help", "copyright", "credits" or "license" for more information.

>>> import os

>>>

>>> # Set TensorFlow log level to suppress warnings and info messages

>>> os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

>>>

>>> # Now you can import and use TensorFlow

>>> import tensorflow as tf

>>> import tensorflow as tf

>>> print(tf.sysconfig.get_build_info())

OrderedDict([('cpu_compiler', '/usr/lib/llvm-17/bin/clang'), ('cuda_compute_capabilities', ['sm_50', 'sm_60', 'sm_70', 'sm_80', 'compute_90']), ('cuda_version', '12.3'), ('cudnn_version', '8'), ('is_cuda_build', True), ('is_rocm_build', False), ('is_tensorrt_build', True)])

>>>

>>> physical_devices = tf.config.list_physical_devices('GPU')

>>> print(physical_devices)

[]

>>>

>>> physical_devices = tf.config.list_physical_devices('GPU')

>>> print(physical_devices)

[]

>>>

pip show tf-nightly

Name: tf-nightly

Version: 2.16.0

Summary: TensorFlow is an open source machine learning framework for everyone.

Home-page: https://www.tensorflow.org/

Author: Google Inc.

Author-email: packages@tensorflow.org

License: Apache 2.0

Location: /usr/local/lib64/python3.9/site-packages

Requires: absl-py, astunparse, flatbuffers, gast, google-pasta, grpcio, h5py, keras-nightly, libclang, ml-dtypes, numpy, opt-einsum, packaging, protobuf, requests, setuptools, six, tb-nightly, tensorflow-io-gcs-filesystem, termcolor, typing-extensions, wrapt

Required-by:

Hi @RtheK, I was able to detect gpu with tensorflow 2.16-nightly.

Could please let us know in which environment you are facing an issue. Also i can see that you are using CUDA 12.3. could you please try by using CUDA 12.2. Thank You.

RHEL 9, Python 3.9, cudnn 12.9, and just upgraded CUDA to 12.4. Oddly, I have 2 GPU nodes that work just fine. The fix was to set CUDNN_PATH and include CUDNN_PATH/lib to $LD_LIBRARY_PATH. This one node has the same packages, but clearly there is some difference that I can’t find:

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

2024-02-26 14:35:16.580905: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.

2024-02-26 14:35:16.581143: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.

2024-02-26 14:35:16.584441: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.

2024-02-26 14:35:16.625948: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.

To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

2024-02-26 14:35:17.760304: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.

Skipping registering GPU devices...

[]

Notice that the log:

Could not find cuda drivers on your machine, GPU will not be used.

is repeated twice. That indicates 2 devices, and indeed, there are2 GPU’s, there has to be an env variable I am missing.