How to make tensorflow see GPUs?

I installed tensorflow with pip install tensorflow-gpu (tensorflow-gpu-2.6.0)

I followed these steps to install CUDA:

but when I’m trying to run my tensorflow code I’m getting:

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-10 19:48:31.318358: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64/openmpi/lib/:/usr/local/cuda/lib64:/usr/local/lib:/usr/lib:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/mpi/lib:/lib/:/usr/local/cuda/lib64:/usr/local/lib:/usr/lib:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib:/opt/amazon/efa/lib:/usr/local/mpi/lib:/opt/amazon/openmpi/lib:/usr/lib64/openmpi/lib/:/usr/local/cuda/lib64:/usr/local/lib:/usr/lib:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/mpi/lib:/lib/:
2021-09-10 19:48:31.319345: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1835] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

and when I check GPUs with print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU'))) it’s:
Num GPUs Available: 0

I am using Python 3.7.7

For Cuda 11.4 you can subscribe and upvote to:

I think TensorFlow 2.6.0 is on CUDA 11.2

Can you please advise on how to downgrade CUDA from 11.4 to 11.2?

I was trying to reinstall everything, but wasnt able to resolve this when installing CUDA 11.2 again:

The following packages have unmet dependencies:
 cuda : Depends: cuda-11-4 (>= 11.4.2) but it is not going to be installed
 cuda-runtime-11-2 : Depends: cuda-drivers (>= 460.32.03) but it is not going to be installed
 linux-image-4.15.0-1102-azure : Conflicts: linux-image-unsigned-4.15.0-1102-azure but 4.15.0-1102.113 is to be installed
 linux-image-4.15.0-1102-oem : Conflicts: linux-image-unsigned-4.15.0-1102-oem but 4.15.0-1102.113 is to be installed
 linux-image-4.15.0-1112-azure : Conflicts: linux-image-unsigned-4.15.0-1112-azure but 4.15.0-1112.125 is to be installed
 linux-image-4.15.0-1122-azure : Conflicts: linux-image-unsigned-4.15.0-1122-azure but 4.15.0-1122.135 is to be installed
 linux-image-unsigned-4.15.0-1102-azure : Conflicts: linux-image-4.15.0-1102-azure but 4.15.0-1102.113 is to be installed
 linux-image-unsigned-4.15.0-1102-oem : Conflicts: linux-image-4.15.0-1102-oem but 4.15.0-1102.113 is to be installed
 linux-image-unsigned-4.15.0-1112-azure : Conflicts: linux-image-4.15.0-1112-azure but 4.15.0-1112.125 is to be installed
 linux-image-unsigned-4.15.0-1122-azure : Conflicts: linux-image-4.15.0-1122-azure but 4.15.0-1122.135 is to be installed
 linux-modules-nvidia-390-4.15.0-1102-aws : Depends: nvidia-kernel-common-390 (<= 390.143-1) but 390.144-0ubuntu0.18.04.1 is to be installed
 linux-modules-nvidia-390-4.15.0-1112-azure : Depends: nvidia-kernel-common-390 (<= 390.141-1) but 390.144-0ubuntu0.18.04.1 is to be installed
 linux-modules-nvidia-450-4.15.0-1102-aws : Depends: nvidia-kernel-common-450 (<= 450.119.03-1) but it is not going to be installed
                                            Depends: nvidia-kernel-common-450 (>= 450.119.03) but it is not going to be installed
 linux-modules-nvidia-450-4.15.0-1112-azure : Depends: nvidia-kernel-common-450 (<= 450.102.04-1) but it is not going to be installed
                                              Depends: nvidia-kernel-common-450 (>= 450.102.04) but it is not going to be installed
 linux-modules-nvidia-460-4.15.0-1102-aws : Depends: nvidia-kernel-common-460 (<= 460.73.01-1) but it is not going to be installed
                                            Depends: nvidia-kernel-common-460 (>= 460.73.01) but it is not going to be installed
 linux-modules-nvidia-460-4.15.0-1112-azure : Depends: nvidia-kernel-common-460 (<= 460.56-1) but it is not going to be installed
                                              Depends: nvidia-kernel-common-460 (>= 460.56) but it is not going to be installed
 linux-modules-nvidia-460-4.15.0-1122-azure : Depends: nvidia-kernel-common-460 (<= 460.91.03-1) but it is not going to be installed
                                              Depends: nvidia-kernel-common-460 (>= 460.91.03) but it is not going to be installed
 linux-modules-nvidia-470-4.15.0-1122-azure : Depends: nvidia-kernel-common-470 (<= 470.57.02-1) but it is not going to be installed
                                              Depends: nvidia-kernel-common-470 (>= 470.57.02) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

If you can I suggest to try TF in a GPU Docker container so that you don’t need to change CUDA version on your system:

I was trying to run docker, but it only works without gpus flag, because when I’m trying:
docker run --gpus all -it tensorflow/tensorflow:latest-gpu bash
it’s throws this error:
docker: Error response from daemon: exec: "nvidia-container-runtime-hook": executable file not found in $PATH. ERRO[0000] error waiting for container: context canceled

how can I fix that?

when I’m running without gpus:
docker run -it tensorflow/tensorflow:latest-gpu bash

Have you checked the step for Nvidia support in Docker?

So it’s said I have to install nvidia-container-toolkit:

On versions including and after 19.03, you will use the nvidia-container-toolkit package and the --gpus all flag

my docker -v:
Docker version 19.03.11

I was trying to install sudo apt-get install -y nvidia-container-runtime as said in the guide but this occured:

cuda-drivers is already the newest version (470.57.02-1).
cuda-drivers set to manually installed.
You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
 cuda-drivers-470 : Depends: libnvidia-gl-470 (>= 470.57.02) but it is not going to be installed
 libnvidia-ifr1-470 : Depends: libnvidia-gl-470 but it is not going to be installed
 nvidia-driver-470 : Depends: libnvidia-gl-470 (= 470.57.02-0ubuntu1) but it is not going to be installed
                     Recommends: nvidia-prime (>= 0.8) but it is not going to be installed
                     Recommends: libnvidia-compute-470:i386 (= 470.57.02-0ubuntu1) but it is not installable
                     Recommends: libnvidia-decode-470:i386 (= 470.57.02-0ubuntu1) but it is not installable
                     Recommends: libnvidia-encode-470:i386 (= 470.57.02-0ubuntu1) but it is not installable
                     Recommends: libnvidia-ifr1-470:i386 (= 470.57.02-0ubuntu1) but it is not installable
                     Recommends: libnvidia-fbc1-470:i386 (= 470.57.02-0ubuntu1) but it is not installable
                     Recommends: libnvidia-gl-470:i386 (= 470.57.02-0ubuntu1) but it is not installable
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).

I was trying different ways to install and reinstall nvidia drivers and cuda but I guess I did some mistake, but this what I see when I’m trying installing it with other installer:

So I’m not sure how to fix “unmet dependencies” issue, because uninstalling and reinstalling again doesn’t solve it…

and still have:

docker: Error response from daemon: exec: "nvidia-container-runtime-hook": executable file not found in $PATH.
ERRO[0000] error waiting for container: context canceled 

when running docker run --gpus all -it tensorflow/tensorflow:latest-gpu bash

You need to solve your packages problem on the host as you need to complete that apt install.