CNN network causes kernel die

I have windows 11 installed and using tensorflow 2.12 on wsl.
CUDA Version: 11.8 is installed
I can train deep neural networks successfully with my GPU. However, when I try to use CNN network, system crashes. As an example; I can train mnist database with below code

import tensorflow as tf

from tensorflow import keras

mnist = keras.datasets.mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images / 255.0

test_images = test_images / 255.0

model = keras.Sequential([

keras.layers.Flatten(input_shape=(28, 28)),

keras.layers.Dense(128, activation=‘relu’),

keras.layers.Dense(10, activation=‘softmax’)

])

model.compile(optimizer=‘adam’, loss=‘sparse_categorical_crossentropy’, metrics=[‘accuracy’])

model.fit(train_images, train_labels, epochs=5)

test_loss, test_acc = model.evaluate(test_images, test_labels)

print(‘Test accuracy:’, test_acc)

----------------------------CNN IMPLEMENTATION------------------
However, kernel dies when I run below code

import tensorflow as tf
from tensorflow import keras
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape(-1, 28, 28, 1)
test_images = test_images.reshape(-1, 28, 28, 1)
train_images = train_images / 255.0
test_images = test_images / 255.0
model = keras.Sequential([
keras.layers.Conv2D(32, (3, 3), activation=‘relu’, input_shape=(28, 28, 1)),
keras.layers.MaxPooling2D((2, 2)),
keras.layers.Flatten(),
keras.layers.Dense(128, activation=‘relu’),
keras.layers.Dense(10, activation=‘softmax’)
])
model.compile(optimizer=‘adam’, loss=‘sparse_categorical_crossentropy’, metrics=[‘accuracy’])
model.fit(train_images, train_labels, epochs=5)
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(‘Test accuracy:’, test_acc)

@Umut_Koksoy,

Welcome to the Tensorflow Forum!

Thank you for taking the time to report the issue. We will check and update you.

2 Likes

I am getting a new error before kernel dies. It may help to find out the problem.

---------------------------------------.----------------------------------------------------------------

2023-06-21 20:31:38.325419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3893 MB memory: → device: 0, name: NVIDIA GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5 2023-06-21 20:31:38.529059: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 188160000 exceeds 10% of free system memory.

Epoch 1/5

2023-06-21 20:31:38.672844: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 188160000 exceeds 10% of free system memory.

---------------------------------------.----------------------------------------------------------------
Training mnist dataset should not consume too much memory.
I am using tf 2.12 I wrote below code before training

devices = tf.config.list_physical_devices(‘GPU’)

device = devices[0]

tf.config.experimental.set_memory_growth(device, True)

is_memory_growth_enabled = tf.config.experimental.get_memory_growth(device)

print(f’Memory growth for {device} is: {is_memory_growth_enabled}')

Print out - > Memory growth for PhysicalDevice(name=‘/physical_device:GPU:0’, device_type=‘GPU’) is: True

@Umut_Koksoy,

Could you please add the following line at the beginning of the program and let us know?

import os
os.environ["tf_gpu_allocator"]="cuda_malloc_async"

Thank you!

@Umut_Koksoy,

While reproducing the issue i also observed the same behaviour with Tensorflow 2.12.

Thank you!

1 Like

More information about cuda and cudnn versions installed…

Hi @chunduriv ,

I am not familiar with the tensorflow forum and I would like to know if tensorflow team gets feedback about the issues on this platform to fix or do we need to report it on another platform?

In this case, the code works properly on google colab. tensorflow version and cuda version that is installed on my computer are same as the versions on google colab.

I think there is something missing on tensorflow installation guide on wsl. I dont think that there is a problem with tf version 2.12

@Umut_Koksoy,

Yes, we have issues with WSL2 only. There is no action required on your end. We reported it to the concerned team and are awaiting the fix.

Thanks!

1 Like

Hi @chunduriv

After the new release of tensorflow 2.13.0, I tried it on WSL 2. The problem is still not fixed.

1 Like

I have exactly the same issue and it is not resolved as on date. I am concerned why does jupyter lab --debug not show any error repports as the kernel dies.

1 Like

Hi @Umut_Koksoy

Did you manage to find any workaround to resolve this issue?

Hi @Nikita_Krotenko
I couldn’t find any solutions on WSL2 therefore I installed linux.
So you cannot run any CNN networks with tensorflow version above 2.10 on any windows machines currently.

2 Likes

Very disconcerting that this issue is not getting enough traction on this forum . Very limiting when you withdraw support for native windows and leave users with a buggy implementation. I guess we need to wait this one out.

same here, very disappointed. I have the same issue, I thought my laptop’s gpu was damaged, got a desktop and kernel keeps dying. I installed it as note in your page:

conda install -c conda-forge cudatoolkit=11.8.0python3 -m pip install nvidia-cudnn-cu11==8.6.0.163 tensorflow==2.13.*mkdir -p $CONDA_PREFIX/etc/conda/activate.decho 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.shecho 'export LD_LIBRARY_PATH=$CUDNN_PATH/lib:$CONDA_PREFIX/lib/:$LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.shsource $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh# Verify install:python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

while it works, when loading a model and doing a simple predict, kernel keeps dying. Any solution for this?

Hi @Umut_Koksoy, @Abhijit_Mustafi, I have tried to train the model in Jupyter notebook with cnn layers using tensorflow 2.15 with GPU on WSL2. I did not face any error.

Could you please try with the latest version of tensorflow and let us know if the issue is resolved or not.
Thank You.

Hi @Kiran_Sai_Ramineni , thank you the problem is resolved with tensorflow 2.15

Unfortunately with Tensorflow 2.15 the notebook stays completely unresponsive when I run the cell to fit the model containing a CNN. Dense models work just fine and my GPU is detected as well. Very frustrating still.

Hi @Kiran_Sai_Ramineni @Umut_Koksoy I guess really need your help on this and sorry for the long post. But I am really stuck and have some deadlines approaching.
This is what is detected

and as you can see I have a straightforward model and the summary also runs

But as soon as I run the fit command, the notebook just does nothing and the console shows no error messages at all

As I have said previously Dense models run just fine.

Please help me with this.

Hi @Abhijit_Mustafi, I have tried to run the CNN model with tensorflow 2.15.0 with GPU in jupyter notebook and did not face any error.

Could you please try to create a new environment and install tensorflow and CUDA using pip install tensorflow[and-cuda] and try to run the CNN the model. Thank You.

Interesting observation. Created a fresh environment as per your advice. Simple CNN models now run (two layers with 32 filters in each followed by Flatten). But increasing the number of layers kills the kernel. No error messages on the console either.