Python crashes when I run tf.random.normal([1000, 1000]) in TensorFlow 2

zzzhhh · October 18, 2021, 4:29am

When I run the following code in python:

import tensorflow as tf
import os
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'
tf.random.normal([1000, 1000])

the last line `tf.random.normal([1000, 1000])’ crashes Python. This is the screenshot:

please see the screenshot image.

I’m sorry I don’t know how to handle the triple > which mess up the formatting so I have to use screenshot image. As can be seen from the output, the error message is

F tensorflow/core/platform/default/env.cc:73] Check failed: ret == 0
(11 vs. 0)Thread tf_numa_-1_Eigen creation via pthread_create()
failed.

Since the sys admin is very disagreeable, I have to figure out what the problem is myself. But I really don’t know what the cause of the crash could be. I had thought it is because the RAM is too small, but this thread says TF2 can start with a limited RAM size. I’m at my wits end so I ask here for help. Following is the configuration of the machine:

Remote Linux with core version 5.8.0. I am not a super user.
Python 3.8.6
CUDA Version: 11.1
GPU is RTX 3090 with driver version 455.23.05
CPU: Intel Core i9-10900K
TensorFlow version: 2.6.0
System imposed RAM quota: 4GB
System imposed number of threads: 512198
System imposed RLIMIT_NPROC value: 300

If you need other information related to this error, please let me know. Thank you for helping me troubleshoot this problem.

Bhack · October 18, 2021, 12:26pm

Do you have the same problem with:
https://www.tensorflow.org/api_docs/python/tf/config/experimental/set_memory_growth?hl=it#for_example

zzzhhh · October 19, 2021, 2:41am

tf.config.list_physical_devices('GPU') returns an empty list. But GPU is installed and I can return information of it from nvidia-smi.

Bhack · October 19, 2021, 3:10am

I suppose CUDA 11.2 Is required:

zzzhhh · October 19, 2021, 3:29am

I’m a bit surprised about the difference made between 11.1 and 11.2. Anyway, thank you for pointing out the requirement of CUDA 11.2.