I know there are a thousand post about this sort of thing, and I’ve spent days and days trying to get this to work.
Somehow I got to benchmark on the cifar10 model about a week ago but I have had no luck since. GPU was detected but unable to load model into memory.
Originally I already had CUDA12.0 installed, thats what worked. I have since tried to downgrade to 11.2 unsuccessfully, now GPU is not detected at all in tensorflow 2.11
I have tried all I can think of, and all I could find online.
I am at my wits end with this after hours poured into this.
Any help would be greatly appreciated
Welcome to the Tensorflow Forum!
Could you please share details of your operating system and steps that you took to install Tensorflow?
System is i7-7700k, 24gb ram, and rtx 3090, tensorflow-gpu 2.10, running in anaconda.
I have reverted to cuda 12.0 and my GPU is now detected.
However, when attempting to run a training example on the cifar-10 model, I am out of memory. (hard drive not RAM, my RAM seems to be untouched).
GPU is loaded with approx 18GB out of 24GB available
I have approx 7GB free on my disk and it is fully consumed and unable to carry out training.
Is this simply a case of not having enough empty disk space? Or is there something else I should check?
You can try limiting gpu memory. Currently it can be handled in two ways
Turn on memory growth by calling [
tf.config.experimental.set_memory_growth]. (tf.config.experimental.set_memory_growth | TensorFlow v2.11.0).
It allocates more memory as the process increases and demands extra memory
Set a hard limit on the total memory