How to list missing GPU libraries

Hello all, I’m trying to get TF to run on my 6800 XT with ROCm. When running tf.config.list_physical_devices(), it says Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. and then skips registering GPU devices. However, there are no libraries “mentioned above”. How can I get TF to list missing libraries? There is no more output. And I set the logging level to the lowest, DEBUG, to ensure I would get all output, but there is nothing more. Please advise. Thank you all.

When TensorFlow (TF) fails to recognize a GPU, especially with AMD GPUs like the 6800 XT using ROCm, it can be due to various reasons including missing dependencies, incorrect ROCm installation, or TensorFlow not being properly configured to recognize ROCm. Here’s a structured approach to diagnose and potentially solve the issue:

Ensure ROCm Installation: Verify that ROCm is correctly installed on your system. You can check this by running rocminfo and clinfo commands in the terminal. If these commands do not output information about your GPU, you need to revisit the ROCm installation process.

Install ROCm-supported TensorFlow: Not all TensorFlow versions support ROCm. You need to install the ROCm-enabled TensorFlow version. This can be done via pip using the command:

pip install tensorflow-rocm

Ensure you have the correct version of ROCm-supported TensorFlow that is compatible with your ROCm installation.

Dependencies: Ensure all ROCm dependencies are installed. Missing dependencies can cause TensorFlow not to recognize the GPU. The ROCm GitHub page provides a list of dependencies required.

Environment Variables: Setting certain environment variables can help TensorFlow detect your GPU. Ensure the following environment variables are set:

HSA_FORCE_FINE_GRAIN_PCIE=1
HIP_VISIBLE_DEVICES=0 (0 is typically the device ID for the first GPU; adjust if necessary)

You can set these variables in your shell configuration file (e.g., .bashrc or .zshrc) or before running your Python script.

Permissions: Ensure your user has the necessary permissions to access the GPU. This can sometimes be an issue with access to the /dev/kfd device. You can add your user to the video and render groups:

sudo usermod -a -G video $LOGNAME
sudo usermod -a -G render $LOGNAME