Problem with compiling a model with GPU

juzun · June 15, 2021, 3:26pm

I am trying to build a simple model and then compiling it. This is what I get as output:

2021-06-15 15:43:48.088419: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-06-15 15:43:53.962346: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-06-15 15:43:53.995966: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1779] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2021-06-15 15:43:54.002691: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-06-15 15:43:54.011579: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-06-15 15:43:54.013741: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-06-15 15:43:54.018713: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-06-15 15:43:54.022424: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-06-15 15:43:54.028365: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-06-15 15:43:54.033437: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-06-15 15:43:54.035969: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-06-15 15:43:54.038805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1917] Adding visible gpu devices: 0
Num GPUs Available:  1
2021-06-15 15:50:17.508570: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-06-15 15:50:17.515800: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1779] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2021-06-15 15:50:17.520069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1917] Adding visible gpu devices: 0
2021-06-15 15:50:17.886477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-15 15:50:17.889673: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1310]      0
2021-06-15 15:50:17.891514: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1323] 0:   N
2021-06-15 15:50:17.893731: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1464] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4007 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)

I understand this means that it found my GPU and that is OK, but why doesn’t the compiler work?
This is the simple code I am running:

model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28,28)),
    keras.layers.Dense(128, activation="relu"),
    keras.layers.Dense(10, activation="softmax")
    ])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5)
test_loss, test_accuracy = model.evaluate(test_images, test_labels)
print("Tested accuracy: ", test_accuracy)

Bhack · June 15, 2021, 3:44pm

What is the error and your TF version?

Remy_Wehrung · June 16, 2021, 4:24am

the kernel, the dependencies, the driver is up to date?

juzun · June 17, 2021, 9:20am

There is no error, I wrote what is an output and that is all, I have no errors, but the program just won’t run. My TF version is 2.5.0-dev20210312.

juzun · June 17, 2021, 9:20am

Yes, spended a few hours with that, all this stuff is up to date.

Bhack · June 17, 2021, 10:20am

I don’t your import.
Are you using keras or tf.keras?

juzun · June 17, 2021, 10:54am

Sorry, I import this:

import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

Bhack · June 17, 2021, 11:23am

Can you try to run this in our Docker container?

juzun · June 17, 2021, 5:04pm

I installed it and imported the TensorFlow image, but I do not know how to run the program from there. I am really not good in this, I gues I will just remove CUDA, CUDNN, reinstall TensorFlow and hope that I will be able to run neural network at least on CPU…

Bhack · June 17, 2021, 5:24pm

You need just -v <yourlocalpath>:<containerpath> in the Docker command to expose your local Path.

Jonathan_Roy · December 20, 2021, 5:48pm

Did you find the solution? I use a old code who was running great open it in a fresh new environnement and my code give me the same result, it seem to stay idle…