Allocated CPUs not working with Keras Functional API

Hello,
I’m using a server with 72 CPU cores (no GPUs). I install the pip install "intel-tensorflow-avx512==2.8.0" package as the server supports AVX512, and also set export TF_ENABLE_ONEDNN_OPTS=1 (I guess this isn’t required anyway with the Intel pkg). The driver setup works, i.e., the 72 CPUs work at 100%, when computing a toy example in python

import tensorflow as tf
A = tf.random.normal([int(1e5), int(1e5)])
B=tf.multiply(A,A)

However, when I execute the actual training script (It’s too long to post it here) nothing happens. It’s a Keras Functional API model, trains as Siamese net, the Dataset generator connects to Cassandra, etc.
htop shows that 72 processes are allocated but only 1 process is running (i.e., 71 processes are idle at 0%). At least all the big tf.keras.layers.Dense layers should perform parallel matrix multiplication.
Is there some sort of TF command, function or method I need to trigger within the python script?

machine learning - TensorFlow 2 keras model training very slow on CPU and most cpu cores (>95% cores) are idle - Stack Overflow seems to be about the same problem but has no answer

Problem solved. It was a Dataset generator error that didn’t show up in the logs. The whole optimization loop just stucked.

1 Like