Installation instruction on website are incomplete?

The official installation guide in my opinion is wrong or at least hiding behind technical jargon without providing a solution.

Issue 1: libdevice

This problem is prevalent and appears all the time. It happens even when following the guide 100%.

The solution is to manually create a folder and copy a required file which has a hardcoded relative path

Issue 2: ptxas

Then it seems to work until you want to fit a model and I get this issue.

Couldn’t invoke ptxas.exe

The solution is to install yet another library:

conda install -c nvidia cuda-nvcc

and then it works!

I suppose this is related to this quote form the guide:

  • Packages do not contain PTX code except for the latest supported CUDA® architecture; therefore, TensorFlow fails to load on older GPUs when CUDA_FORCE_PTX_JIT=1 is set. (See Application Compatibility for details.)

I mean I’m a user not a low level hardware expert. So this doesn’t say much. here the guide could give a lot more higher level information and provide a solution how to make it work. I say this because the solution will often be needed as my card and driver is cuda 11.4 and I need to do this step with TF2.11 (cuda 11.2) and TF2.12(cuda 11.8) or else the issue happens. So i doubt many people have the exact matching combinations needed of card, driver and TF and I doubt many want to build from source.

Any thoughts on this? what am I misunderstanding?

2 Likes

Hi @beginner1991, If you are facing those error in linux ,Could you please confirm the solution provided in the step 6 in the document can resolve those errors. Thank You.

Hi Kiran,

yes the “commands” for Ubuntu 22.04 is more or less what i did to fix the problems. in my case I’m on Ubuntu 20.04 but the same issue happens. i also used just.

conda install -c "nvidia/label/cuda-11.8.0" cuda-nvcc

For TF2.12 but it also works for TF2.11 so I think here the nvcc version doesn’t really need to match the cudatoolkit version.

However note that

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

works fine at lest after fiting libdevice. the ptxas issue only appears when you want to fit a model. before that things seem to run just fine.

Hi @beginner1991

I agree the official installation guide is terrible.

After trying a thousand different things , your solution worked for me. Thanks! I also had to find the libdevice file and copy it to a new place. No idea why or why it’s not installed in the correct place in the first place by conda.

Hi @cbrady154, @beginner1991, Could please provide more details where you face the difficulty while installing the tensorflow by following the official document. Thank You.

Sure,

I followed the official installation instructions exactly. Then I was able to confirm detection of the GPU by running the command line code :

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

After this, when I try to fit a model, it gives an error saying " Can’t find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:"

To fix this , I had to look for a file libdevice.10.bc . I made a new folder in the bin folder of my tf environment called $CONDA_PREFIX/bin/nvvm/libdevice. Then I moved the libdevice.10.bc file there.

cp $CONDA_PREFIX/lib/libdevice.10.bc $CONDA_PREFIX/bin/nvvm/libdevice

After this, to get it to work, I also had to set the XLA_FLAGS variable like so

export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX/bin

This took care of the first error. Trying to fit a model again I get another error about ptxas, issue 2 mentioned by @begginer1991. I followed his suggestion and installed the additional library

conda install -c nvidia cuda-nvcc

After this it seemed happy, and I was able to train , using the MNIST data set and training tutorial .

I have no idea the reasons for any of these steps, I am just blindly copying what others have suggested.

Good luck!