Problems with [and-cuda] installation/ cudnn version missmatch

LeSch · December 19, 2023, 7:34am

Hey Everyone. I have just updated to tensorflow 2.15, to migrate my codebase to a newer version.
I have done so via:
pip install --upgrade tensorflow[and-cuda]

inside of a conda environment.

This apparently updated required cuda libraries, which is a great relief, since i don’t want to manually do that ever again in my life.

However unfortunatly it seems something did not work correctly, and i am stuck with an error, when using an optimizer in a keras training loop:

2023-12-18 16:31:34.863134: I external/local_xla/xla/service/service.cc:168] XLA service 0x7ff738108ec0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-12-18 16:31:34.863151: I external/local_xla/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce GTX 1650 Ti, Compute Capability 7.5
2023-12-18 16:31:35.268097: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:447] Loaded runtime CuDNN library: 8.1.0 but source was compiled with: 8.9.4. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2023-12-18 16:31:35.269310: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at xla_ops.cc:574 : FAILED_PRECONDITION: DNN library initialization failed. Look at the errors above for more details.
2023-12-18 16:31:35.277396: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:447] Loaded runtime CuDNN library: 8.1.0 but source was compiled with: 8.9.4. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2023-12-18 16:31:35.280756: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at xla_ops.cc:574 : FAILED_PRECONDITION: DNN library initialization failed. Look at the errors above for more details.

Any ideas, what exactly is happening here? Is some older installation of cuda conflicting with the new installation?

My versions are:

`
pip list | grep tensorflow

tensorflow 2.15.0.post1
tensorflow-addons 0.22.0
tensorflow-datasets 4.9.3
tensorflow-estimator 2.15.0
tensorflow-hub 0.15.0
tensorflow-io-gcs-filesystem 0.27.0
tensorflow-metadata 1.10.0
tensorflow-model-optimization 0.7.5
tensorflow-probability 0.23.0
tensorflow-text 2.14.0
`
Unfortunatly i have no idea, where to look up the correct cuda/cuddn version on my system. nvcc is not installed, and nvidia-smi shows cuda 12.0, but i get no information of the cudnn version, and the driver output has been unreliable i feel.

Does anyone have an suggestion, how i should proceed? I am kinda out of ideas here, since i don’t want to wipe my complete system again, just because i potentially have a few older cuda/cudnn skeletons in the basement.

Kiran_Sai_Ramineni · December 19, 2023, 8:53am

Hi @LeSch, Could you please try to create a new environment and follow the below steps

"!pip install --extra-index-url https://pypi.nvidia.com tensorrt-bindings==8.6.1 tensorrt-libs==8.6.1
!pip install -U tensorflow[and-cuda]==2.15.0
import tensorflow as tf; print(tf.__version__); print(tf.config.list_physical_devices('GPU'))"

Thank You.

LeSch · December 19, 2023, 9:56am

Hey, thanks for you answer. I solved the problem. It seems i had a cudnn installation in my base conda environment from a previous torch/tensorflow installation. I uninstalled this version and everything runs well now.