Tensorflow Certification Version 2.13 vs 2.14

Vic_Lim · October 8, 2023, 1:49pm

gpu tensorflow-certification

Hi everyone. I am interested to take up the latest version of TensorFlow certification.

From the handbook above, I was required to install the following

Python v3.9
Tensorflow v2.13

I am currently using GPU 3070 ti, and as such I installed cuda v11.8.

Issue

However, I met up with an issue

2023-10-08 19:19:28.718121: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-10-08 19:19:29.023876: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-10-08 19:19:29.025239: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-08 19:19:29.820897: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Python Version : 3.9.18 (main, Aug 25 2023, 13:20:14) 
[GCC 11.4.0]
TF Version : 2.13.1
2023-10-08 19:19:30.663261: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:07:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-10-08 19:19:30.663442: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1960] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
GPUs : []

But if I were to use Tensorflow v2.14, it will be able to detect GPU

2023-10-08 21:37:24.423756: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-10-08 21:37:24.423807: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-10-08 21:37:24.423821: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-10-08 21:37:24.428031: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Python Version : 3.9.18 (main, Aug 25 2023, 13:20:14) 
[GCC 11.4.0]
TF Version : 2.14.0
2023-10-08 21:37:25.547013: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:07:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-10-08 21:37:25.549888: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:07:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-10-08 21:37:25.549960: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:07:00.0/numa_node
Your kernel may have been built without NUMA support.
GPUs : [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Python Code

I have attached my python code so as to print out the expected response above.

import tensorflow as tf
import sys

def check_gpu_resources() -> None:
    print(f"GPUs : {tf.config.list_physical_devices('GPU')}")


def check_tensorflow_resources() -> None:
    print(f"TF Version : {tf.__version__}")


def check_python_version()->None:
    print(f"Python Version : {sys.version}")

if __name__ == "__main__":
    check_python_version()
    check_tensorflow_resources()
    check_gpu_resources()

As such I will like to ask, for the examination, if v2.13 does not work, am I allowed to use Tensorflow v2.14 instead ?

Renu_Patel · October 11, 2023, 3:00pm

Hi @Vic_Lim

Welcome to the TensorFlow Forum!

From the first error log, its seems that there is no Nvidia CUDA driver installed for the GPU.

Please check the Hardware/Software requirements as per your system OS and follow the step by step instructions given in this TF install official page to install the Tensorflow with GPU support.

NOTE: Please send an email to the TF certification exam team (tensorflow-certificate-support@google.com) for further assistance in this. Thank you

patrick_flanigan · November 9, 2023, 4:09am

Hello,

Those instructions from the TF_Install install tensorflow 2.14.0. Is 2.14.0 now ok to use for the TensorFlow Certification?

I am running into the same problem with not being able to see my GPU.

From the following code and installing following the step by step guide from the tensorflow handbook I got the following.

import tensorflow as tf
from tensorflow.keras import datasets, layers
from tensorflow.python.platform import build_info as tf_build_info

# Check TensorFlow version
print("TensorFlow version:", tf.__version__)

# Check CUDA version
build_info = tf_build_info.build_info
cuda_version = build_info.get("cuda_version")
print("CUDA version:", cuda_version)

# Check cuDNN version (not directly available, need to check manually or from TensorFlow's documentation)
cudnn_version = build_info.get("cudnn_version")
print("cuDNN version:", cudnn_version)

# Print GPU information using TensorFlow
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    try:
        # Print details of the available GPUs
        for gpu in gpus:
            print("\nGPU Details:", tf.config.experimental.get_device_details(gpu))
    except RuntimeError as e:
        print(e)
else:
    print("No GPUs found.")

I get the following output

2023-11-08 20:37:10.929679: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-11-08 20:37:10.948746: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-08 20:37:11.265173: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
TensorFlow version: 2.13.0
CUDA version: 11.8
cuDNN version: 8
2023-11-08 20:37:11.508566: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
No GPUs found.
2023-11-08 20:37:11.519996: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1960] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

The output of nvidia-smi is as follows:

Wed Nov  8 20:42:45 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
|  0%   32C    P8    19W / 170W |      0MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Renu_Patel · November 9, 2023, 4:22am

Hi @patrick_flanigan

Welcome to the TensorFlow Forum!

Which system OS are you using because if you are using WinOS, you will not be able to setup GPU for teh Tensorflow > 2.10 as mentioned in this TF install page. TensorFlow 2.10 was the last TensorFlow release that supported GPU on native-Windows.

patrick_flanigan · November 9, 2023, 9:28am

I am using Ubuntu 22.04.3.

Can we use tensorflow 2.14.0 for the cert exam?

Renu_Patel · November 9, 2023, 9:35am

Okay, Please refer to this specific page for TF installation. Also I have observed that you need to install cuDNN 8.7 and CUDA 11.8 for GPU support using Tensorflow version 2.14 as mentioned in this tested build configuration.

NOTE: Please send an email to the TF certification exam team (tensorflow-certificate-support@google.com) for further assistance in this. Thank you

patrick_flanigan · November 9, 2023, 12:18pm

Well I install an NVIDIA 525 driver and cuda 12 with cuda_12.0.0_525.60.13_linux.run.

Somehow with a clean env and install tensorflow 2.13.0 i got my hello world to work even though it still prints that it is using cuda 11.8. I will take that as a win as I am now in RHEL 8.

Thu Nov  9 07:16:57 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13    Driver Version: 525.60.13    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   38C    P3    33W / 170W |    507MiB / 12288MiB |     27%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      5269      G   /usr/libexec/Xorg                 198MiB |
|    0   N/A  N/A      5720      G   /usr/bin/gnome-shell              124MiB |
|    0   N/A  N/A     10594      G   /usr/lib64/firefox/firefox        181MiB |
+-----------------------------------------------------------------------------+