CUDA 12.3 and Python 3.9, with any TF, GPU not found

RtheK · February 21, 2024, 4:01am

Is there a way to get the GPU’s to work? Python 3.11 works flawlessly. This is on RHEL 9.


[GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] on linux

Type "help", "copyright", "credits" or "license" for more information.

>>> import os

>>>

>>> # Set TensorFlow log level to suppress warnings and info messages

>>> os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

>>>

>>> # Now you can import and use TensorFlow

>>> import tensorflow as tf

>>> import tensorflow as tf

>>> print(tf.sysconfig.get_build_info())

OrderedDict([('cpu_compiler', '/usr/lib/llvm-17/bin/clang'), ('cuda_compute_capabilities', ['sm_50', 'sm_60', 'sm_70', 'sm_80', 'compute_90']), ('cuda_version', '12.3'), ('cudnn_version', '8'), ('is_cuda_build', True), ('is_rocm_build', False), ('is_tensorrt_build', True)])

>>>

>>> physical_devices = tf.config.list_physical_devices('GPU')

>>> print(physical_devices)

[]

>>>

>>> physical_devices = tf.config.list_physical_devices('GPU')

>>> print(physical_devices)

[]

>>>

pip show tf-nightly

Name: tf-nightly

Version: 2.16.0

Summary: TensorFlow is an open source machine learning framework for everyone.

Home-page: https://www.tensorflow.org/

Author: Google Inc.

Author-email: packages@tensorflow.org

License: Apache 2.0

Location: /usr/local/lib64/python3.9/site-packages

Requires: absl-py, astunparse, flatbuffers, gast, google-pasta, grpcio, h5py, keras-nightly, libclang, ml-dtypes, numpy, opt-einsum, packaging, protobuf, requests, setuptools, six, tb-nightly, tensorflow-io-gcs-filesystem, termcolor, typing-extensions, wrapt

Required-by:

Kiran_Sai_Ramineni · February 26, 2024, 6:50am

Hi @RtheK, I was able to detect gpu with tensorflow 2.16-nightly.

Could please let us know in which environment you are facing an issue. Also i can see that you are using CUDA 12.3. could you please try by using CUDA 12.2. Thank You.

RtheK · February 26, 2024, 7:45pm

RHEL 9, Python 3.9, cudnn 12.9, and just upgraded CUDA to 12.4. Oddly, I have 2 GPU nodes that work just fine. The fix was to set CUDNN_PATH and include CUDNN_PATH/lib to $LD_LIBRARY_PATH. This one node has the same packages, but clearly there is some difference that I can’t find:

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

2024-02-26 14:35:16.580905: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.

2024-02-26 14:35:16.581143: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.

2024-02-26 14:35:16.584441: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.

2024-02-26 14:35:16.625948: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.

To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

2024-02-26 14:35:17.760304: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.

Skipping registering GPU devices...

[]

Notice that the log:

Could not find cuda drivers on your machine, GPU will not be used.

is repeated twice. That indicates 2 devices, and indeed, there are2 GPU’s, there has to be an env variable I am missing.

Kiran_Sai_Ramineni · April 15, 2024, 8:32am

Hi @RtheK, As per test build configuration Tensorflow 2.16.1 supports CuDnn 8.9 and CUDA 12.3 but you are trying with CUDA 12.4 which is incompatible with 2.16. Could you please try with CUDA 12.3. Thank You.

sotiris.gkouzias · April 16, 2024, 2:26pm

There is an open issue regarding GPU utilization here: TF 2.16.1 Fails to work with GPUs · Issue #63362 · tensorflow/tensorflow · GitHub

The respective pull request (pending review) with revised instructions to pip install tensorflow[and-cuda] for linux users with CUDA-enabled GPUs: docs/site/en/install/pip.md at patch-1 · sgkouzias/docs · GitHub I hope it helps!