Tensorflow C API + cppflow with gpu on ubuntu 22.04



I trained DL models on my windows dual boot on my laptop (because at first I couldn’t detect GPU on ubuntu with tensorflow), saved models and then imported them in a c++ program thanks to the cppflow wrapper(program is run on ubuntu 22.04). Needed a bit of time but works like a charm. However my tensorflow C API doesn’t seem to be able to connect to GPU (even though I have proper drivers). So I went through the trouble of seeing GPUs with my ubuntu tf installation, which does work now:

print("Num GPUs Available: ", len(tf.config.list_physical_devices(‘GPU’)))
Num GPUs Available: 1

(I have problems where models don’t actually learn when trained on ubuntu even when using the same code snippet i use on my windows dual boot but that is not the main topic for today).

I have nvidia-smi 520.61.05, CUDA Version 11.8, tensorflow 2.12.0, ubuntu 22.04, python 3.10.6. I can run tensorflow on gpu in python notebooks as stated earlier. nvidia-smi does detect my gpu, and sees its memory filling up when “training” (i use quotes because my models don’t actually train but that’s a problem for later, the win for today is that I can see the GPU from tf)

So I have this output when running my c++ program, loading a pre-trained model (was trained on Windows, which does work for training):

2023-04-12 14:28:25.053124: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-12 14:28:25.067584: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2023-04-12 14:28:25.067622: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: Little-Box-Reborn
2023-04-12 14:28:25.067630: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: Little-Box-Reborn
2023-04-12 14:28:25.067669: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 520.61.5
2023-04-12 14:28:25.067702: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 520.61.5
2023-04-12 14:28:25.067711: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 520.61.5

Then the model runs on CPU (properly of course) but quite slowly, and I need it for real time computations!

If any of you have any idea of how to fix this I’d be so happy. I can provide with any more information necessary, regarding the model, the way it was trained etc… (if needed) or my installation versions etc



1 Like

make sure that the TensorFlow api is seeing the GPU

    auto devices = TF_SessionListDevices(this->session.get(), this->status.get());
    if(TF_GetCode(this->status.get()) != TF_OK){
        fprintf(stderr, "ERROR: Unable to list devices %s\n", TF_Message(this->status.get()));

    for(int i = 0; i < TF_DeviceListCount(devices); i++){
        auto device = TF_DeviceListName(devices, i, this->status.get());
        if(TF_GetCode(this->status.get()) != TF_OK){
            fprintf(stderr, "ERROR: Unable to get device name %s\n", TF_Message(this->status.get()));
        std::cout << device << std::endl;