Libdevice not found. Why is it not found in the searched path?

(Reposted from stackoverflow due to non response there; similar unresolved issues from other users are also referenced)

Win 10 64-bit 21H1; TF2.5, CUDA 11 installed in environment (Python 3.9.5 Xeus)

I am not the only one seeing this error; see also (unanswered) here and here.
The issue is obscure and the proposed resolutions are unclear/don’t seem to work (see e.g. here)

Issue Using the TF Linear_Mixed_Effects_Models.ipynb example (download from TensorFlow github here) execution reaches the point of performing the “warm up stage” then throws the error:

InternalError: libdevice not found at ./libdevice.10.bc [Op:__inference_one_e_step_2806]

The console contains this output showing that it finds the GPU but XLA initialisation fails to find the - existing! - libdevice in the specified paths

2021-08-01 22:04:36.691300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9623 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2021-08-01 22:04:37.080007: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
2021-08-01 22:04:54.122528: I tensorflow/compiler/xla/service/service.cc:169] XLA service 0x1d724940130 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-08-01 22:04:54.127766: I tensorflow/compiler/xla/service/service.cc:177]   StreamExecutor device (0): NVIDIA GeForce GTX 1080 Ti, Compute Capability 6.1
2021-08-01 22:04:54.215072: W tensorflow/compiler/tf2xla/kernels/random_ops.cc:241] Warning: Using tf.random.uniform with XLA compilation will ignore seeds; consider using tf.random.stateless_uniform instead if reproducible behavior is desired.
2021-08-01 22:04:55.506464: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
2021-08-01 22:04:55.512876: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2021-08-01 22:04:55.517387: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin
2021-08-01 22:04:55.520773: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
2021-08-01 22:04:55.524125: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2021-08-01 22:04:55.526349: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.

Now the interesting thing is that the paths searched includes “C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin”

the content of that folder includes all the (successfully loaded at TF startup) DLLs, including cudart64_110.dll, dudnn64_8.dll… and of course libdevice.10.bc

Question Since TF says it is searching this location for this file and the file exists there, what is wrong and how do I fix it?

(NB C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2 does not exist… CUDA is intalled in the environment; this path must be a best guess for an OS installation)

Info: I am setting the path by

aPath = '--xla_gpu_cuda_data_dir=C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin'
print(aPath)
os.environ['XLA_FLAGS'] = aPath

but I have also set an OS environment variable XLA_FLAGS to the same string value… I don’t know which one is actually working yet, but the fact that the console output says it searched the intended path is good enough for now

Hi @Julian_Moore ,
Since Anaconda is not officially supported by the TensorFlow team, I would suggest looking through the GPU doc and possibly the Docker GPU doc and verify that NVIDIA’s software is installed correctly (can be tricky, I know). And might check the WIndows pip install doc and make sure you have the Microsoft Visual C++ Redistributable installed and all that. Then you can start to narrow down if it’s a GPU driver issue, an Anaconda installation issue, or a TensorFlow issue.

Hi @billy

Many thx for the input!

The GPU software is otherwise just fine: many models have been run, all cuda DLLs load. And Anaconda is used only to build the base environment and install Python & CUDA. Several working TF2 GPU envs have been built this way without issue (obviously though this case has never been exercised before)

Whilst I will take a look at the C++ aspect, the key point for me is that AFAICT a correct location is being searched for a file that exists there, yet is not “found”.

If you have an explanation for that it might suggest specific mitigations (whatever the C++ redistributable situation is, I don’t see how it would account for the observed behaviour)

Just confirming @billy that I did download and install the latest VC_redist v14.29.30040.0 (I have vs 2015, 2017), installation fine, rebooted, reran the notebook and the libdevice not found error persists.

FYI this previously noted error, also still occurs. (relevance?)

2021-08-04 18:48:06.587240: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.

Ah, sorry. I found a similiar issue filed with JAX (that also depends on XLA). Try taking a look through some of the comments and see if those symlinks help.

1 Like

@Billy I’ve fixed it, but the debug info is deeply unhelpful and something is way off generally.

I had seen similar threads to the one you linked; they were where I got XLA_FLAGS suggestions & info.

Of course it’s all complicated by my OS being windows and the other discussions being linux oriented and the whole thing being at a technical level generally way over my head - I just want to code in Python.

However, the issue was resolved by providing the file (as a copy) at this path

C:\Users\Julian\anaconda3\envs\TF250_PY395_xeus\Library\bin\nvvm\libdevice\

Note that C:\Users\Julian\anaconda3\envs\TF250_PY395_xeus\Library\bin was the path given to XLA_FLAGS, but it seems it is not looking for the libdevice file there it is looking for the \nvvm\libdevice\ path This means that I can’t just set a different value in XLA_FLAGS to point to the actual location of the libdevice file because, to coin a phrase, it’s not (just) the file it’s looking for.

This is really annoying as I will have to hand patch every environment I create… and I have yet to see whether this would also avoid libdevice issues with JAX as well

The debug info earlier:

2021-08-05 08:38:52.889213: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
2021-08-05 08:38:52.896033: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2021-08-05 08:38:52.899128: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin
2021-08-05 08:38:52.902510: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
2021-08-05 08:38:52.905815: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .

is incorrect insofar as there is no “CUDA” in the search path; and FWIW I think a different error should have been given for searching in C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2 since there is no such folder (there’s an old V10.0 folder there, but no OS install of CUDA 11)

There is a Windows environment variable CUDA_PATH that points to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0 but that is not referenced by any error messages either. The searched path C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2 seems to be a guess, and as such a) unsuccessful b) unhelpful c) misleading

I note that the info also included the message “You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule’s DebugOptions” but I could not work out how to do that, and again, a level of technical complexity that seems inappropriate.

Rhetorical question: since TensorFlow generally is quite happy with the cuda installation in the (anaconda) environment, why is other stuff looking elsewhere in this idiosyncratic way?

So… you have some new input & 1 new question for sometime: are such libdevice issues something that will get sorted out (for the benefit of JAX too)?

Hope this has been helpful.

Cheers, Julian