Can't see the GPU on a laptop, ubuntu under WSL

twistfire · January 4, 2023, 1:51pm

Hi there!

I am trying to learn tensorflow and use it for signal preprocessing and object detection - to train and use neural network using Python.

I am using Windows 11, WSL 2, Ubuntu:

uname -m && cat /etc/*release
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.1 LTS"
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

First step is to install tensorflow - I have done it using pip install routine and I have semi-succeeded.
By using this guide (CUDA Installation Guide for Linux).

But when I am trying to run lspci to list all devices I can’t see any Nvidia devices…

$ lspci
1e5b:00:00.0 3D controller: Microsoft Corporation Basic Render Driver
34c3:00:00.0 System peripheral: Red Hat, Inc. Virtio file system (rev 01)
4034:00:00.0 SCSI storage controller: Red Hat, Inc. Virtio filesystem (rev 01)
54e8:00:00.0 3D controller: Microsoft Corporation Basic Render Driver
6691:00:00.0 SCSI storage controller: Red Hat, Inc. Virtio filesystem (rev 01)
8cad:00:00.0 SCSI storage controller: Red Hat, Inc. Virtio filesystem (rev 01)
fe48:00:00.0 SCSI storage controller: Red Hat, Inc. Virtio console (rev 01)

When I tried to use Tensorflow simple test, it returned the info that I have not some libraries that CUDA requires…

for simple python script:

print(tf.reduce_sum(tf.random.normal([1000, 1000]))) # to see if the tf is working
print(tf.config.list_physical_devices('GPU')) # to see the list of GPU

See the output:

2022-12-25 20:28:49.080528: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-25 20:28:49.429094: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-12-25 20:28:50.386852: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-12-25 20:28:50.387255: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-12-25 20:28:50.387342: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
WARNING:root:Limited tf.summary API due to missing TensorBoard installation.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
2022-12-25 20:28:52.964517: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-12-25 20:28:53.297342: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
tf.Tensor(-43.878326, shape=(), dtype=float32)
[]

How to enable TF to work with the GPU I have under WSL?
I can provide any required additional information - but I don’t know what is needed indeed.

chunduriv · January 4, 2023, 3:57pm

@twistfire,

Welcome to the Tensorflow Forum!

To use GPU, we should install compatible CUDA and cuDNN versions as per tested build configurations as shown below

Have you configured the system paths correctly? Please refer to the GPU setup section for more details.

Thank you!

twistfire · January 4, 2023, 7:45pm

Thanks for the fast reply.

I don’t really understand how to install the Nvidia GPU driver (under WSL2?)…

I need to install it under Windows, right? then I have it.

When I run nvidia-smi I got this output (sorry for corrupted formatting, I can’t add images):

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.65       Driver Version: 527.56       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| N/A   51C    P8     4W /  40W |   1577MiB /  4096MiB |      8%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

print("TensorFlow version:", tf.__version__)

TensorFlow version: 2.11.0

And I have tried to install CUDA and cuDNN but got some errors during the installation.
I didn’t use Conda for it (is it really necessary?) - I have tried to use this instructions: CUDA Installation Guide for Linux

But, step 2.1. Verify You Have a CUDA-Capable GPU - I can’t see any Nvidia GPU - only ```
3D controller: Microsoft Corporation Basic Render Driver


**How to understand that CUDA or cuDNN is installed correctly, or at least installed?**

And yes - I haven't done anything with the paths - after the installation. How can i check and add required variables with paths?

8bitmp3 · January 4, 2023, 10:40pm

Installation is often a challenge @twistfire so no worries.

I discovered a few official guides from Microsoft (for Windows 11+WSL) and NVIDIA:

Microsoft Learn: Enable NVIDIA CUDA on WSL 2 | Microsoft Learn ----> you may actually need to start here DirectML Plugin for TensorFlow 2 | Microsoft Learn
NVIDIA Docs: CUDA on WSL User Guide
On YouTube: GPU Accelerated Machine Learning with WSL 2 - YouTube (by Microsoft Developers)

and unofficial instructions:

On YouTube: Install WSL2 on Windows 11 with NVIDIA GPU and Docker Support - YouTube

Haven’t tested any of these guides but at least the official docs may be worth a try.

Good luck and let us know how it goes

cc @markdaoust

twistfire · January 5, 2023, 12:14pm

Hi there and thanks for your help!

Tried this manual: TensorFlow with DirectML on WSL | Microsoft Learn
install with miniconda (never did this before)

created a virtual environment using conda - directml, like in the manual and tried to create a project in a vscode with one simple file:

import tensorflow.compat.v1 as tf 
import os

os.system('clear')
tf.enable_eager_execution(tf.ConfigProto(log_device_placement=True)) 

# version of tf
print("TensorFlow version:", tf.__version__)

# is the tf working
print('Result of tf.add')
print(tf.add([1.0, 2.0], [3.0, 4.0])) 

# GPU's
print('List of avail GPU')
print(tf.config.list_physical_devices('GPU'))

It outputs:

TensorFlow version: 1.15.8
Result of tf.add
2023-01-05 13:09:08.026307: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library libdirectml.d6f03b303ac3c4f2eeb8ca631688c9757b361310.so
2023-01-05 13:09:08.026369: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library libdxcore.so
2023-01-05 13:09:08.037274: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library libd3d12.so
2023-01-05 13:09:09.944293: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:250] DirectML device enumeration: found 2 compatible adapters.
2023-01-05 13:09:09.944638: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2023-01-05 13:09:09.946167: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:186] DirectML: creating device on adapter 0 (NVIDIA GeForce RTX 3050 Ti Laptop GPU)
2023-01-05 13:09:10.068148: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:186] DirectML: creating device on adapter 1 (Intel(R) Iris(R) Xe Graphics)
2023-01-05 13:09:10.130494: I tensorflow/core/common_runtime/eager/execute.cc:571] Executing op Add in device /job:localhost/replica:0/task:0/device:DML:0
tf.Tensor([4. 6.], shape=(2,), dtype=float32)
List of avail GPU
Traceback (most recent call last):
File “/home/fire/py_projects/directml_try/try.py”, line 17, in
print(tf.config.list_physical_devices(‘GPU’))
File “/home/fire/miniconda3/envs/directml/lib/python3.7/site-packages/tensorflow_core/python/util/module_wrapper.py”, line 193, in getattr
attr = getattr(self._tfmw_wrapped_module, name)
AttributeError: module ‘tensorflow._api.v1.compat.v1.config’ has no attribute ‘list_physical_devices’
Segmentation fault

this version of Tensorflow is old enough, as I understand or it’s ok for the newbie to start?
How to check if GPU is working and what GPU is used? I can see it can use both?
when I comment the line with #print(tf.config.list_physical_devices('GPU')) I still have some segmentation fault… What is the segmentation fault?

TensorFlow version: 1.15.8
Result of tf.add
2023-01-05 13:12:17.274463: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library libdirectml.d6f03b303ac3c4f2eeb8ca631688c9757b361310.so
2023-01-05 13:12:17.274519: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library libdxcore.so
2023-01-05 13:12:17.282622: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library libd3d12.so
2023-01-05 13:12:19.856726: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:250] DirectML device enumeration: found 2 compatible adapters.
2023-01-05 13:12:19.857182: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2023-01-05 13:12:19.859154: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:186] DirectML: creating device on adapter 0 (NVIDIA GeForce RTX 3050 Ti Laptop GPU)
2023-01-05 13:12:19.981908: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:186] DirectML: creating device on adapter 1 (Intel(R) Iris(R) Xe Graphics)
2023-01-05 13:12:20.053131: I tensorflow/core/common_runtime/eager/execute.cc:571] Executing op Add in device /job:localhost/replica:0/task:0/device:DML:0
tf.Tensor([4. 6.], shape=(2,), dtype=float32)
Segmentation fault

8bitmp3 · January 6, 2023, 12:55am

The API has changed. Have you tried TF 2.10 or newer with your setup @twistfire ?

We have a GPU guide: Use a GPU | TensorFlow Core that you may find helpful. For example:

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

twistfire · January 6, 2023, 3:12pm

Hi there!

When I tried to install TensorFlow using simple pip install tensorflow from my projects virtual environment (Python 3.10), I installed the latest stable version with some warnings.

when I runned there:

print("TensorFlow version:", tf.__version__)
print(tf.config.list_physical_devices('GPU'))

I have output:

TensorFlow version: 2.11.0
[]
2023-01-06 13:59:03.247842: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-01-06 13:59:03.570042: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2023-01-06 13:59:03.570081: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

So I can’t see any GPUs (and some libraries ).
When I have tried to add PATH to use CUDA:

export PATH=/usr/local/cuda/bin:$PATH

But still got an empty list [] as a result of:

print(tf.config.list_physical_devices('GPU'))

**Or this empty list - says that everything works fine?? (I suppose not). So I just don’t understand - can I use GPU for acceleration or not… And what is the problem:)? **

When I run: nvcc -V
I have got the following output:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

Someone advised me to use this manual to install tensorflow (using conda and python 3.6…3.7)
GPU accelerated ML training in WSL | Microsoft Learn

When I used conda for it - I installed tensorflow-directml and inside that virtual environment directml I have this Python 3.7.15,
tensorflow-directml 1.15.8
tensorflow-estimator 1.15.1

And I see that GPU’s found using these code:

import tensorflow.compat.v1 as tf 
import os

os.system('clear')
tf.enable_eager_execution(tf.ConfigProto(log_device_placement=True)) 

# version of tf
print("TensorFlow version:", tf.__version__)

# is the tf working
print('Result of tf.add')
print(tf.add([1.0, 2.0], [3.0, 4.0])) 

print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

TensorFlow version: 1.15.8
Result of tf.add
2023-01-06 15:37:56.070266: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library libdirectml.d6f03b303ac3c4f2eeb8ca631688c9757b361310.so
2023-01-06 15:37:56.070346: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library libdxcore.so
2023-01-06 15:37:56.082585: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library libd3d12.so
2023-01-06 15:37:58.140729: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:250] DirectML device enumeration: found 2 compatible adapters.
2023-01-06 15:37:58.141074: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2023-01-06 15:37:58.142531: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:186] DirectML: creating device on adapter 0 (NVIDIA GeForce RTX 3050 Ti Laptop GPU)
2023-01-06 15:37:58.270672: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:186] DirectML: creating device on adapter 1 (Intel(R) Iris(R) Xe Graphics)
2023-01-06 15:37:58.352630: I tensorflow/core/common_runtime/eager/execute.cc:571] Executing op Add in device /job:localhost/replica:0/task:0/device:DML:0
tf.Tensor([4. 6.], shape=(2,), dtype=float32)
Traceback (most recent call last):
File “/home/fire/py_projects/directml_try/try.py”, line 14, in
print("Num GPUs Available: ", len(tf.config.list_physical_devices(‘GPU’)))
File “/home/fire/miniconda3/envs/directml/lib/python3.7/site-packages/tensorflow_core/python/util/module_wrapper.py”, line 193, in getattr
attr = getattr(self._tfmw_wrapped_module, name)
AttributeError: module ‘tensorflow._api.v1.compat.v1.config’ has no attribute ‘list_physical_devices’
Segmentation fault

So in this case I can see the GPU - see the output, but the version of TF has no such has no attribute ‘list_physical_devices’ to get a list of all avail GPUs…

8bitmp3 · January 6, 2023, 9:14pm

@twistfire Based on the table of contents on the left-hand side and the instructions in the external documentation on Microsoft Learn - TensorFlow with DirectML on WSL | Microsoft Learn - it appears that they have TF with DirectML on WSL2 working for TF 1 only:

There is another section in the table of contents called Enable TensorFlow with DirectML for TensorFlow 2:

which goes over how to install TensorFlow 2 on Windows Native and Windows WSL.

Can you check if that works for you @twistfire?

To recap:

Currently, the official TensorFlow instructions on tensorflow.org/install state that:

For Windows Native:

which takes you to: GitHub - microsoft/tensorflow-directml-plugin: DirectML PluggableDevice plugin for TensorFlow 2

And for Windows WSL it’s:

As we have discovered, if you go to Microsoft DirectML docs https://learn.microsoft.com/en-us/windows/ai/directml/gpu-accelerated-training:

it says that WSL 2 works with TensorFlow 1. However, there’s another guide in DirectML Plugin for TensorFlow 2 | Microsoft Learn that explains how to install TensorFlow 2 on Windows Native and Windows WSL:

twistfire · January 7, 2023, 9:36pm

Thanks for such a detailed analysis. But really I don’t understand the principles for now and how to install it properly…

when I have tried this manual : Enabling GPU acceleration on Ubuntu on WSL2 with the NVIDIA CUDA Platform | Ubuntu

I have done everything like it’s mentioned. I have maked ./DeviceQuery app and ./bandwidthtest and it gave me a CUDA capable GPU. Here is the resulting of that application runned.

(base) fire@note-4:~/py_projects/cuda-samples/Samples/1_Utilities/bandwidthTest$ ./bandwidthTest
[CUDA Bandwidth Test] - Starting…
Running on…

Device 0: NVIDIA GeForce RTX 3050 Ti Laptop GPU
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 10.1

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 9.9

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 159.7

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

But when I am trying to use GPU inside the python environment where I have installed tensorflow - I get the empty list of available GPUs… and the same errors about some libraries that can’t be loaded…
Maybe the cause is in that libraries?

Where to search for libnvinfer.so.7?

2023-01-07 22:42:04.190839: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libnvinfer.so.7’; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-12.0/lib64
2023-01-07 22:42:04.190912: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libnvinfer_plugin.so.7’; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-12.0/lib64

Where does Tensorflow get the list of GPUs available?

Still having the same issue…

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))


if tf.config.list_physical_devices('GPU'):
  print("TensorFlow **IS** using the GPU")
else:
  print("TensorFlow **IS NOT** using the GPU")

Outputs:

Num GPUs Available: 0
TensorFlow IS NOT using the GPU