Problem running Tensorflow BERT tutorial on GPU

Stefano_Di_Pietro · July 8, 2021, 11:37am

Hello everybody.

I was trying to run the Tensorflow BERT tutorial for text classification ( Classify text with BERT | Text | TensorFlow ), but it does non want to run on my GPU. Does anyone faced the same issue?

Thank you.

8bitmp3 · July 8, 2021, 1:51pm

Hi @Stefano_Di_Pietro , I am training the model with GPU enabled in Colab (in the cloud) without issues (notebook: Google Colab → Edit > Notebook settings > GPU).

So, this may be specific to your setup. Have you had issues with training other TensorFlow 2.x models on your GPU? Can you share your OS, GPU model, cuDNN/CUDA versions?

Hardware requirements: Suporte a GPUs | TensorFlow
Software requirements: GPU support | TensorFlow
GPU support: GPU support | TensorFlow

Stefano_Di_Pietro · July 8, 2021, 6:36pm

Thank you @8bitmp3 for the reply. I already run plenty of tensorflow models on my computer

This model is running on a Ubuntu 20.10
The Graphic card is a GeForce GTX1060, driver Version: 460.80
CUDA Version: 11.2
cuDNN: 7.6.5

Bhack · July 8, 2021, 7:14pm

What kind of error do you have?

Stefano_Di_Pietro · July 8, 2021, 7:15pm

I don’t have any error, but the model is not using my GPU.

Bhack · July 8, 2021, 7:22pm

What is the value of this in your setup:

https://www.tensorflow.org/api_docs/python/tf/test/is_gpu_available

Stefano_Di_Pietro · July 8, 2021, 9:16pm

Oh my god … it says False.
But I see a Python process under Anaconda (I’m using Anaconda, I forgot to mention it):

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 909 G /usr/lib/xorg/Xorg 456MiB |
| 0 N/A N/A 1505 G /usr/bin/kded5 24MiB |
| 0 N/A N/A 1510 G /usr/bin/kwin_x11 45MiB |
| 0 N/A N/A 1558 G /usr/bin/plasmashell 24MiB |
| 0 N/A N/A 1622 G /usr/lib/firefox/firefox 11MiB |
| 0 N/A N/A 2066 G …AAAAAAAAA= --shared-files 128MiB |
| 0 N/A N/A 75968 G …AAAAAAAAA= --shared-files 30MiB |
| 0 N/A N/A 242507 C …conda3/envs/tf/bin/python 61MiB |
±----------------------------------------------------------------------------+

8bitmp3 · July 9, 2021, 1:46pm

For TF v2.5 the minimum cuDNN version appears to be 8.1
GPU support | TensorFlow - do you think this may be causing the issue or are you running your setup with TF v2.4 or lower?

Alternatively, maybe trying v2.5 in a Docker container (as @bhack might recommend) - Docker | TensorFlow - can help resolve this issue.

Stefano_Di_Pietro · July 10, 2021, 11:53am

Actually, in some way, my Anaconda environment was broken. I tried with a new created one it it works fine.
I will try to figure it out.
Thank you for the help.

Ricardo · July 10, 2021, 9:34pm

Hi, Try with this

Tutorial