TensorFlow version:2.16 just released!

sotiris.gkouzias · March 8, 2024, 8:06am

TensorFlow version 2.16 has been officially released. However, as I understand it installation is not recommended for Ubuntu operating systems (?). My PC specs include an Ubuntu OS with RTX3060 Nvidia graphics card (CUDA-enabled). I have installed already TensorFlow version 2.15.0 and it works absolutely fine utilizing my GPU. Is there any news regarding when could users with Ubuntu OS and CUDA-enabled GPUs upgrade to TensorFlow 2.16 ?

Juan_Vargas · March 8, 2024, 9:09pm

I have Ubuntu 22.04.4 LTS. I also have a RTX3060. TF 2.15 works nice and well with the GPUs in the RTX3060 but TF 2.16 does not. I reported the issue to the Google team; as far as I can tell that issue still persists. What I find surprising is that the GPUs in my box work really well with Julia and with torch. Hopefully the TF team will fix this issue soon.

sotiris.gkouzias · March 11, 2024, 8:21pm

It turns out that by running pip install tensorflow[and-cuda] in a new conda environment with python version = 3.10 and use JAX as the Keras backend (edit as appropriate the local config file at ~/.keras/keras.json) works like a charm

Welcome TensorFlow 2.16 !

sotiris.gkouzias · March 20, 2024, 2:40pm

Addressing cuDNN Loading Failure

The initial workaround focused on resolving the cuDNN library loading issue. It was discovered that TensorFlow 2.16 stopped loading cuDNN from the typical Python site-packages directory. The solution was to manually adjust the LD_LIBRARY_PATH environment variable to include the directory where cuDNN was located:

export LD_LIBRARY_PATH=~/anaconda3/lib/python3.10/site-packages/nvidia/cudnn/lib/:$LD_LIBRARY_PATH

After applying this change, TensorFlow was able to recognize and list the GPU devices, indicating that the cuDNN loading issue was resolved.

Encountering a New Issue with ptxas

Upon attempting to execute a deep learning model training script, a new error emerged related to the ptxas tool, which is part of the CUDA toolkit and is responsible for compiling PTX (Parallel Thread Execution) code to SASS (Streamlined Assembler) code. The error suggested a bug in the version of ptxas being used, affecting the XLA compilation process.

Resolving the ptxas Issue

The final solution to make TensorFlow 2.16.1 fully operational with GPU support involved locating a compatible version of ptxas that did not exhibit the reported bug. This compatible version was found within the site-packages directory of a Python installation, under a CUDA toolkit installation path:

...lib/python3.10/site-packages/nvidia/cuda_nvcc/bin

Manually add this specific path to the environment variables and limit this change to the Conda virtual environment specifically created for TensorFlow 2.16.1. This step will ensure that TensorFlow can correctly utilize the ptxas tool during the compilation process, essential for training deep learning models on the GPU.

Github issue link: TF 2.16.1 Fails to work with GPUs · Issue #63362 · tensorflow/tensorflow · GitHub