Help needed: GPU version slow at (likely) device setup

I just got a new GPU and am trying out the GPU version of tensorflow. I installed it through conda

conda create -n tf-gpu tensorflow-gpu
conda activate tf-gpu

My little test script worked okay except that

model = tf.keras.models.Sequential([
  tf.keras.layers.Dense(128, activation='relu'),

takes quite a while. It prints out things like

2021-08-22 19:28:51.617228: I tensorflow/core/common_runtime/gpu/] Adding visible gpu devices: 0
2021-08-22 19:28:51.617263: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-08-22 19:34:08.926351: I tensorflow/core/common_runtime/gpu/] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-22 19:34:08.926382: I tensorflow/core/common_runtime/gpu/]      0 
2021-08-22 19:34:08.926392: I tensorflow/core/common_runtime/gpu/] 0:   N 

where as you can see it takes minutes before the “Device interconnect” line is printed.

The above Sequential line is done within a second if I ran the CPU version of tensorflow. Also the actually training of the network with the GPU version is only twice as fast as the CPU version.

I see that the conda tensorflow package comes with CUDA 10.1. I have CUDA 11.2 installed elsewhere. Not sure if that’s a problem or not.

Am I missing something in the setup of the GPU or elsewhere?


Can you try to run this in our official GPU image:

I tried in the official tensorflow-gpu docker image and it worked okay there. This is probably a problem in the conda package.


1 Like