WSL2 Installation Failing Miserably

Joshua_Mannheimer · April 17, 2023, 2:03am

I am trying to install Tensorflow using WSL2 on windows as outlined on the TF website
Install TensorFlow with pip. I get to the point where I am getting confirmation the GPU is registering but I am getting quite a bit of feedback and when I try to train a simple CNN it errors out. I have tried to uninstall and reinstall things using instructions nvidia but still no success. When I try to use other methods to install it doe not recognize the gpu. The first things it spits out is

2023-04-15 22:21:52.990342: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.

but this seems to just be a warning, the model compiles but when it starts to train

it finally errors out
2023-04-15 22:21:56.409879: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:424] Loaded cuDNN version 8600
Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory
Aborted

Does anybody know how to fix this?

chunduriv · April 17, 2023, 3:47am

@Joshua_Mannheimer,

Welcome to the Tensorflow Forum!

Configure the system paths. You can do it with following command everytime your start a new terminal after activating your conda environment.

CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib

When I try to use other methods to install it doe not recognize the gpu.

Could you please elaborate the steps that you have followed to install tensorflow? Also provide System Configuration, GPU Card, cuDNN, CUDA, NVIDIA driver version and CUDA Compute Capability details to debug further?

Thank you!

Anton_Milev · April 18, 2023, 9:00pm

Hello, @Joshua_Mannheimer

Did you managed to install tensorflow with GPU on WSL2? I also have similar problems, everything I tried failed. On WSL2 I have tested pytorch with GPU and CUDA Samples so the problem is not with drivers, etc.

My video card is RTX 3060 for notebook.

pip install tensorflow-gpu
python3 -m pip install nvidia-cudnn-cu11==8.6.0.163 tensorflow==2.12.*
pip install --ignore-installed tensorflow<2.12.0
pip install --ignore-installed “tensorflow<2.12.0”
pip install tensorflow==2.10

Nothing works and each time I end on WSL with Tensorflow installed without the GPU.

This is the full print:

Python 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices(‘GPU’)))
2023-04-18 23:43:25.274791: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-18 23:43:25.375614: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2023-04-18 23:43:25.379647: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libcudart.so.11.0’; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-04-18 23:43:25.379678: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-04-18 23:43:25.396866: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-04-18 23:43:25.830051: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libnvinfer.so.7’; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-04-18 23:43:25.830118: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libnvinfer_plugin.so.7’; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-04-18 23:43:25.830123: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
print("Num GPUs Available: ", len(tf.config.list_physical_devices(‘GPU’)))
2023-04-18 23:43:26.418880: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:966] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-18 23:43:26.418995: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libcudart.so.11.0’; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-04-18 23:43:26.419042: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libcublas.so.11’; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2023-04-18 23:43:26.419081: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libcublasLt.so.11’; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2023-04-18 23:43:26.419120: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libcufft.so.10’; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2023-04-18 23:43:26.447389: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libcusparse.so.11’; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2023-04-18 23:43:26.447475: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libcudnn.so.8’; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2023-04-18 23:43:26.447494: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at Instalar TensorFlow con pip for how to download and setup the required libraries for your platform.
Skipping registering GPU devices…
Num GPUs Available: 0

Joshua_Mannheimer · April 18, 2023, 10:24pm

So I was able to get ii to work the following way
install everything as laid out in Install TensorFlow with pip for wsl2.
Then test if your GPU is being registered with the code they five

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

if this does not read something along the lines of

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

the GPU is not being mapped correctly. Additionally one thing that thru me off initially is that when you download cuDNN 8.6.0 you need to copy the bin, lib, and included files and then paste them into your CUDA 11.8 folder in program files.

once you have completed installation you have to do these extra steps

while on wsl

 cd /usr/lib/wsl/lib/
then backup files  "libcuda.so.1" and "libcuda.so" to something like libcuda.so.backup
then
rm -r libcuda.so.1
rm -r libcuda.so
finally 
ln -s libcuda.so.1.1 libcuda.so.1
ln -s libcuda.so.1.1 libcuda.so
sudo ldconfig

once you have done that return to the install page and go to the linux part at the bottom it mentions a fix for ubuntu version 22.04 with the following code

conda install -c nvidia cuda-nvcc=11.3.58
# Configure the XLA cuda directory
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
printf 'export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX/lib/\n' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
# Copy libdevice file to the required path
mkdir -p $CONDA_PREFIX/lib/nvvm/libdevice
cp $CONDA_PREFIX/lib/libdevice.10.bc $CONDA_PREFIX/lib/nvvm/libdevice/
'''
this is what worked for me but I used python 3.9, tensorflow 2.12, CUDA 11.8 and cuDNN 8.6.0

Anton_Milev · April 19, 2023, 1:11am

I was able to fix my problem. It turns out that tensorfow still depends from CUDA toolkit 11.8 but following the official link I have installed CUDA toolkit 12.

I don’t know if this is the right way but for me the following tutorial worked:

I stopped at installing tensorflow because I already did this.

Now tensorflow recognizes the GPU, only complains about missing TensorRT.

chunduriv · April 19, 2023, 6:39am

@Anton_Milev,

According to tested build configurations, tensorflow 2.12 is compatible with CUDA 11.8

Thank you!

Boris_Zhu · May 30, 2023, 12:47pm

solve my problems, really thank you

Gokulakrishnan_M · June 1, 2023, 2:23am

Thanks! It just works!

mof · June 19, 2023, 3:14am

thanks, been struggling with this for hours

Arman_Hovhannisyan · October 31, 2023, 9:12am

Hello, I have the same issue with using CNN kayers, while with dense layers there are no issue.I checked your way to fix the problem, but the issue is that i do not have libcuda.so but libcuda.so.1 and libcuda.so.1.1 are existing.What would you suggest to fix problem?

Cudnn version is 8.6.0 and cuda version is 11.8, tensorflow 2.14.0 version.

Nikolay_Penev · March 26, 2024, 2:49am

You are a lifesaver sir! Thank you very much!