What versions of CUDA and CuDNN are required for Tensorflow 2.16?

sagnik_t · May 15, 2024, 7:24am

NVIDIA driver: 545.29.06
OS: Zorin 17 (based on Ubuntu 22.04)
Python: 3.11.7 (via pyenv)

According to this table: https://www.tensorflow.org/install/source#gpu
TensorFlow 2.16.1 requires CUDA 12.3 and CuDNN 8.9 but can someone confirm this?
(The previous 2 time I installed CUDA ended up breaking my NVIDIA drivers)
Moreover, do I require Clang and Bazel as the table mentions?

sotiris.gkouzias · May 16, 2024, 12:19pm

Welcome @sagnik_t to the TensorFlow Community

You can try the following:

Create a fresh conda virtual environment and activate it,
pip install --upgrade pip,
pip install tensorflow[and-cuda],
Set environment variables:

Locate the directory for the conda environment in your terminal window by running in the terminal:

echo $CONDA_PREFIX

Enter that directory and create these subdirectories and files:

cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh

Edit ./etc/conda/activate.d/env_vars.sh as follows:

#!/bin/sh

# Store original LD_LIBRARY_PATH 
export ORIGINAL_LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" 

# Get the CUDNN directory 
CUDNN_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)")))

# Set LD_LIBRARY_PATH to include CUDNN directory
export LD_LIBRARY_PATH=$(find ${CUDNN_DIR}/*/lib/ -type d -printf "%p:")${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

# Get the ptxas directory  
PTXAS_DIR=$(dirname $(dirname $(python -c "import nvidia.cuda_nvcc; print(nvidia.cuda_nvcc.__file__)")))

# Set PATH to include the directory containing ptxas
export PATH=$(find ${PTXAS_DIR}/*/bin/ -type d -printf "%p:")${PATH:+:${PATH}}

Edit ./etc/conda/deactivate.d/env_vars.sh as follows:

#!/bin/sh

# Restore original LD_LIBRARY_PATH
export LD_LIBRARY_PATH="${ORIGINAL_LD_LIBRARY_PATH}"

# Unset environment variables
unset CUDNN_DIR
unset PTXAS_DIR

Verify the GPU setup:

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

I have submitted the pull request to update the official TensorFlow installation guide.

I hope it helps!

ACodingfreak · June 6, 2024, 3:35am

Thanks for the detailed post @sotiris.gkouzias on using tensorflow on conda with gpu.

I do have couple of questions and need your input

Does " pip install tensorflow[and-cuda]" take care of installing CUDA related packages or do I need to install CUDA driver/CUDA toolkit/cuDNN in my host machine [Ubuntu 22.04] before running pip command ?
Whenever I create a new environment in conda do I need to reinstall all these nvidia packages to use underlying gpu? Say I am using tensorflow 2.15 in env1 and created env2 for tensorflow 2.16, will all the related CUDA packages get installed in env2 ? As the dependencies between 2.15 and 2.16 are different.

Regards,

sotiris.gkouzias · June 6, 2024, 4:51am

@ACodingfreak welcome to the TensorFlow Forum!

Indeed when you run the command pip install tensorflow[and-cuda] all necessary packages in order to utilize your GPU locally are installed as well. However, note that the compatible NVIDIA Driver should be pre-installed. That’s why you should first check by running the command nvidia-smi and then proceed with the installation procedure.

If you wish to install TensorFlow version 2.15.1 in a different conda environment you could try running pip install tensorflow[and-cuda]==2.15.1 and again all necessary packages in order to utilize your GPU locally should be installed as well.

I hope it helps.

ACodingfreak · June 12, 2024, 1:56am

Thanks for the detailed reply @sotiris.gkouzias

Well I have tried the exact instructions but looks like I am not 100% successful.

(tf-gpu) codingfreak@HP-ZBook-PC:~/anaconda3/envs/tf-gpu$ python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-06-11 18:51:41.128921: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-11 18:51:41.181513: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-11 18:51:41.181551: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-11 18:51:41.182877: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-11 18:51:41.190802: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-11 18:51:42.117957: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-06-11 18:51:42.872312: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-06-11 18:51:42.920830: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-06-11 18:51:42.926008: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

$ conda list 
# packages in environment at /home/codingfreak/anaconda3/envs/tf-gpu:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
absl-py                   2.1.0                    pypi_0    pypi
astunparse                1.6.3                    pypi_0    pypi
bzip2                     1.0.8                h5eee18b_6  
ca-certificates           2024.3.11            h06a4308_0  
cachetools                5.3.3                    pypi_0    pypi
certifi                   2024.6.2                 pypi_0    pypi
charset-normalizer        3.3.2                    pypi_0    pypi
flatbuffers               24.3.25                  pypi_0    pypi
gast                      0.5.4                    pypi_0    pypi
google-auth               2.30.0                   pypi_0    pypi
google-auth-oauthlib      1.2.0                    pypi_0    pypi
google-pasta              0.2.0                    pypi_0    pypi
grpcio                    1.64.1                   pypi_0    pypi
h5py                      3.11.0                   pypi_0    pypi
idna                      3.7                      pypi_0    pypi
keras                     2.15.0                   pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1  
libclang                  18.1.1                   pypi_0    pypi
libffi                    3.4.4                h6a678d5_1  
libgcc-ng                 11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libstdcxx-ng              11.2.0               h1234567_1  
libuuid                   1.41.5               h5eee18b_0  
markdown                  3.6                      pypi_0    pypi
markupsafe                2.1.5                    pypi_0    pypi
ml-dtypes                 0.3.2                    pypi_0    pypi
ncurses                   6.4                  h6a678d5_0  
numpy                     1.26.4                   pypi_0    pypi
nvidia-cublas-cu12        12.2.5.6                 pypi_0    pypi
nvidia-cuda-cupti-cu12    12.2.142                 pypi_0    pypi
nvidia-cuda-nvcc-cu12     12.2.140                 pypi_0    pypi
nvidia-cuda-nvrtc-cu12    12.2.140                 pypi_0    pypi
nvidia-cuda-runtime-cu12  12.2.140                 pypi_0    pypi
nvidia-cudnn-cu12         8.9.4.25                 pypi_0    pypi
nvidia-cufft-cu12         11.0.8.103               pypi_0    pypi
nvidia-curand-cu12        10.3.3.141               pypi_0    pypi
nvidia-cusolver-cu12      11.5.2.141               pypi_0    pypi
nvidia-cusparse-cu12      12.1.2.141               pypi_0    pypi
nvidia-nccl-cu12          2.16.5                   pypi_0    pypi
nvidia-nvjitlink-cu12     12.2.140                 pypi_0    pypi
oauthlib                  3.2.2                    pypi_0    pypi
openssl                   3.0.13               h7f8727e_2  
opt-einsum                3.3.0                    pypi_0    pypi
packaging                 24.1                     pypi_0    pypi
pip                       24.0            py311h06a4308_0  
protobuf                  4.25.3                   pypi_0    pypi
pyasn1                    0.6.0                    pypi_0    pypi
pyasn1-modules            0.4.0                    pypi_0    pypi
python                    3.11.9               h955ad1f_0  
readline                  8.2                  h5eee18b_0  
requests                  2.32.3                   pypi_0    pypi
requests-oauthlib         2.0.0                    pypi_0    pypi
rsa                       4.9                      pypi_0    pypi
setuptools                69.5.1          py311h06a4308_0  
six                       1.16.0                   pypi_0    pypi
sqlite                    3.45.3               h5eee18b_0  
tensorboard               2.15.2                   pypi_0    pypi
tensorboard-data-server   0.7.2                    pypi_0    pypi
tensorflow                2.15.1                   pypi_0    pypi
tensorflow-estimator      2.15.0                   pypi_0    pypi
tensorflow-io-gcs-filesystem 0.37.0                   pypi_0    pypi
termcolor                 2.4.0                    pypi_0    pypi
tk                        8.6.14               h39e8969_0  
typing-extensions         4.12.2                   pypi_0    pypi
tzdata                    2024a                h04d1e81_0  
urllib3                   2.2.1                    pypi_0    pypi
werkzeug                  3.0.3                    pypi_0    pypi
wheel                     0.43.0          py311h06a4308_0  
wrapt                     1.14.1                   pypi_0    pypi
xz                        5.4.6                h5eee18b_1  
zlib                      1.2.13               h5eee18b_1

(tf-gpu) codingfreak@HP-ZBook-PC:~/anaconda3/envs/tf-gpu$ pip list
Package                      Version
---------------------------- ----------
absl-py                      2.1.0
astunparse                   1.6.3
cachetools                   5.3.3
certifi                      2024.6.2
charset-normalizer           3.3.2
flatbuffers                  24.3.25
gast                         0.5.4
google-auth                  2.30.0
google-auth-oauthlib         1.2.0
google-pasta                 0.2.0
grpcio                       1.64.1
h5py                         3.11.0
idna                         3.7
keras                        2.15.0
libclang                     18.1.1
Markdown                     3.6
MarkupSafe                   2.1.5
ml-dtypes                    0.3.2
numpy                        1.26.4
nvidia-cublas-cu12           12.2.5.6
nvidia-cuda-cupti-cu12       12.2.142
nvidia-cuda-nvcc-cu12        12.2.140
nvidia-cuda-nvrtc-cu12       12.2.140
nvidia-cuda-runtime-cu12     12.2.140
nvidia-cudnn-cu12            8.9.4.25
nvidia-cufft-cu12            11.0.8.103
nvidia-curand-cu12           10.3.3.141
nvidia-cusolver-cu12         11.5.2.141
nvidia-cusparse-cu12         12.1.2.141
nvidia-nccl-cu12             2.16.5
nvidia-nvjitlink-cu12        12.2.140
oauthlib                     3.2.2
opt-einsum                   3.3.0
packaging                    24.1
pip                          24.0
protobuf                     4.25.3
pyasn1                       0.6.0
pyasn1_modules               0.4.0
requests                     2.32.3
requests-oauthlib            2.0.0
rsa                          4.9
setuptools                   69.5.1
six                          1.16.0
tensorboard                  2.15.2
tensorboard-data-server      0.7.2
tensorflow                   2.15.1
tensorflow-estimator         2.15.0
tensorflow-io-gcs-filesystem 0.37.0
termcolor                    2.4.0
typing_extensions            4.12.2
urllib3                      2.2.1
Werkzeug                     3.0.3
wheel                        0.43.0
wrapt                        1.14.1

(tf-gpu) codingfreak@HP-ZBook-PC:~/anaconda3/envs/tf-gpu$ nvcc -V
Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit
(tf-gpu) codingfreak@HP-ZBook-PC:~/anaconda3/envs/tf-gpu$ 
(tf-gpu) codingfreak@HP-ZBook-PC:~/anaconda3/envs/tf-gpu$ nvidia-smi
Tue Jun 11 18:58:40 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02              Driver Version: 555.42.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX 2000 Ada Gene...    Off |   00000000:01:00.0 Off |                  N/A |
| N/A   37C    P3             10W /   45W |      15MiB /   8188MiB |      7%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2495      G   /usr/lib/xorg/Xorg                              4MiB |
+-----------------------------------------------------------------------------------------+

sotiris.gkouzias · June 12, 2024, 5:24am

@ACodingfreak no worries as it is a harmless warning. The output seems perfectly normal (for more info you can read the relevant discussion here).

NUMA is a memory architecture used in multiprocessor systems where the memory access time depends on the memory location relative to the processor.

NUMA support is important for optimizing memory access on systems with multiple CPUs or GPUs. It allows the operating system to allocate memory and schedule processes in a way that reduces memory access latency.

In order to validate that your GPU is utilized as appropriate try training on your PC a relatively simple deep learning model with a ready-to-use tensorflow dataset for 5 epochs and time it. Run the exact experiment in Google Colab, enable GPU acceleration and time it then compare the results.

ACodingfreak · June 24, 2024, 6:22pm

@sotiris.gkouzias

Thanks for your previous reply and sorry for my late comeback
Let me add some details regarding my setup: HP ZBOOK laptop with RTX 2000 GPU and Ubuntu 22.04.

As mentioned in previous comments, I installed nvidia drivers of version 555.42 and somehow after a week it simply vanished. Not sure how did this happen. So I ended up installing the Ubuntu recommended drivers which is 535 with cuda 12.2 and should work for tensorflow 2.15.1.

$ nvidia-smi
Mon Jun 24 10:54:56 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX 2000 Ada Gene...    Off | 00000000:01:00.0 Off |                  N/A |
| N/A   40C    P8               3W /  45W |   7825MiB /  8188MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2051      G   /usr/lib/xorg/Xorg                            4MiB |
|    0   N/A  N/A     50888      C   ...ak/anaconda3/envs/tf-gpu/bin/python     6414MiB |
|    0   N/A  N/A     55908      C   ...ak/anaconda3/envs/tf-gpu/bin/python     1302MiB |
|    0   N/A  N/A     59240      C   ...ak/anaconda3/envs/tf-gpu/bin/python       92MiB |

I have tried GAN example shared in below link to check the performance.

Local-GPU (2.15.1) - 4.5 seconds per epoch
Colab-GPU (2.15) - 10 seconds per epoch
Local-CPU (2.16.1) - 126 seconds per epoch

sotiris.gkouzias · June 24, 2024, 6:54pm

Impressive results @ACodingfreak! It seems that you successfully utilized your GPU with TensorFlow 2.15.1. Note that if you want to explore and use Keras 3 and its awesome capabilities to perform your deep learning experiments it is best to install TensorFlow 2.16.1.

ACodingfreak · June 24, 2024, 9:45pm

@sotiris.gkouzias - Thanks for your help

Well I will try TF 2.16 hopefully by end of June.

I do have couple of questions which I want to understand

While trying to understand the environment variables I came across ptxas directory which seems to contain nvcc, nccl, runtime and so on.

:~/anaconda3/envs/tf-gpu/lib/python3.11/site-packages/nvidia$ ls
cublas  cuda_cupti  cuda_nvcc  cuda_nvrtc  cuda_runtime  cudnn  cufft  curand  cusolver  cusparse  __init__.py  nccl  nvjitlink  __pycache__

So technically I dont need any explicit installation of CUDA toolkit if using tensorflow-gpu?

I am not able to access nvcc from the terminal. Is this something only available in anaconda environment?

sotiris.gkouzias · June 25, 2024, 5:38am

@ACodingfreak ,

Yes indeed. You do not need any explicit installation of CUDA toolkit if using the created virtual environment tensorflow-gpu. That is the the purpose of running the command pip install tensorflow[and-cuda] in the first place and not for example pip install tensorflow.
As all NVIDIA libs required to use TensorFlow with GPU support are preinstalled in the virtual environment naturally they are available exclusively when you activate the certain environment. You could install TensorFlow using venv alternatively. In such case TensorFlow and all required NVIDIA libs would live in the virtual environment created with venv. Generally, you need to install TensorFlow in a virtual environment to avoid a potential systemic global pollution. If you install packages globally, they clutter your main Python installation and can potentially interfere with system processes. Virtual environments protect your system-wide environment from this.

ACodingfreak · June 25, 2024, 5:01pm

As always thanks for the detailed reply @sotiris.gkouzias

Yummy_Gang · June 26, 2024, 11:39am

Hi there @ACodingfreak , it seems like you have successfully run tensorflow 2.15.1 in local GPU, i thought the latest version of tensorflow that you can run with gpu is only 2.10.0? Im seeking for advices, it seems like i have configured too much in my anaconda environment, now its kinda messed up and I have no idea how to solve it.