Help needed: GPU version slow at (likely) device setup

I just got a new GPU and am trying out the GPU version of tensorflow. I installed it through conda

conda create -n tf-gpu tensorflow-gpu
conda activate tf-gpu

My little test script worked okay except that

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28,28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])

takes quite a while. It prints out things like

2021-08-22 19:28:51.617228: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-08-22 19:28:51.617263: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2021-08-22 19:34:08.926351: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-22 19:34:08.926382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2021-08-22 19:34:08.926392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 

where as you can see it takes minutes before the “Device interconnect” line is printed.

The above Sequential line is done within a second if I ran the CPU version of tensorflow. Also the actually training of the network with the GPU version is only twice as fast as the CPU version.

I see that the conda tensorflow package comes with CUDA 10.1. I have CUDA 11.2 installed elsewhere. Not sure if that’s a problem or not.

Am I missing something in the setup of the GPU or elsewhere?

Thanks!

Can you try to run this in our official GPU image:

I tried in the official tensorflow-gpu docker image and it worked okay there. This is probably a problem in the conda package.

Thanks!

1 Like