Grappler Error in PredictCost() for the op: op: "Conv2D"

Hi everyone,
I am currently having a problem I cannot understand. When trying to train certain models on a GPU I am getting the following error message:

I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30995 MB memory: → device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:1a:00.0, compute capability: 7.0

I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)

W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: “Conv2D” attr { key: “T” value { type: DT_FLOAT } } attr { key: “data_format” value { s: “NCHW” } } attr { key: “dilations” value { list { i: 1 i: 1 i
: 1 i: 1 } } } attr { key: “explicit_paddings” value { list { } } } attr { key: “padding” value { s: “VALID” } } attr { key: “strides” value { list { i: 1 i: 1 i: 1 i: 1 } } } attr { key: “use_cudnn_on_gpu” value { b: true } } inputs { dtype: DT_FLOAT shape { dim { size: -99 } dim
{ size: 8 } dim { size: 28 } dim { size: 14 } } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1 } dim { size: 8 } dim { } } } device { type: “GPU” vendor: “NVIDIA” model: “Tesla V100-SXM2-32GB” frequency: 1530 num_cores: 80 environment { key: “architecture” value:
"7.0" } environment { key: “cuda” value: “11020” } environment { key: “cudnn” value: “8100” } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 6291456 shared_memory_size_per_multiprocessor: 98304 memory_size: 32501399552 bandwidth: 898048000 } outputs { dtype: DT_FLOAT shap
e { dim { size: -99 } dim { } dim { size: 28 } dim { size: 14 } } }

W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: “Conv2DBackpropFilter” attr { key: “T” value { type: DT_FLOAT } } attr { key: “data_format” value { s: “NCHW” } } attr { key: “dilations” value { list
{ i: 1 i: 1 i: 1 i: 1 } } } attr { key: “explicit_paddings” value { list { } } } attr { key: “padding” value { s: “VALID” } } attr { key: “strides” value { list { i: 1 i: 1 i: 1 i: 1 } } } attr { key: “use_cudnn_on_gpu” value { b: true } } inputs { dtype: DT_FLOAT shape { dim { si
ze: -99 } dim { size: 8 } dim { size: 28 } dim { size: 14 } } } inputs { dtype: DT_INT32 shape { dim { size: 4 } } value { dtype: DT_INT32 tensor_shape { dim { size: 4 } } tensor_content: “\001\000\000\000\001\000\000\000\010\000\000\000\000\000\000\000” } } inputs { dtype: DT_FLOA
T shape { dim { size: -23 } dim { } dim { size: -212 } dim { size: -213 } } } device { type: “GPU” vendor: “NVIDIA” model: “Tesla V100-SXM2-32GB” frequency: 1530 num_cores: 80 environment { key: “architecture” value: “7.0” } environment { key: “cuda” value: “11020” } environment {
key: “cudnn” value: “8100” } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 6291456 shared_memory_size_per_multiprocessor: 98304 memory_size: 32501399552 bandwidth: 898048000 } outputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1 } dim { size: 8 } dim { } } }
2021-11-25 10:34:18.729373: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: “Conv2DBackpropInput” attr { key: “T” value { type: DT_FLOAT } } attr { key: “data_format” value { s: “NCHW” } } attr { key: “dilations” value { list
{ i: 1 i: 1 i: 1 i: 1 } } } attr { key: “explicit_paddings” value { list { } } } attr { key: “padding” value { s: “VALID” } } attr { key: “strides” value { list { i: 1 i: 1 i: 1 i: 1 } } } attr { key: “use_cudnn_on_gpu” value { b: true } } inputs { dtype: DT_INT32 shape { dim { siz
e: 4 } } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1 } dim { size: 8 } dim { } } } inputs { dtype: DT_FLOAT shape { dim { size: -23 } dim { } dim { size: -212 } dim { size: -213 } } } device { type: “GPU” vendor: “NVIDIA” model: “Tesla V100-SXM2-32GB” frequency
: 1530 num_cores: 80 environment { key: “architecture” value: “7.0” } environment { key: “cuda” value: “11020” } environment { key: “cudnn” value: “8100” } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 6291456 shared_memory_size_per_multiprocessor: 98304 memory_size: 325
01399552 bandwidth: 898048000 } outputs { dtype: DT_FLOAT shape { dim { size: -23 } dim { size: -214 } dim { size: -215 } dim { size: -216 } } }

I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8201

F ./tensorflow/core/util/gpu_launch_config.h:129] Check failed: work_element_count > 0 (0 vs. 0)

Do you have any idea for a solution of this problem?

Thank you in advance

HelLOOO Dennis_Chroder,

I also meet the same problem about the PredictCost() FOR Conv2D layer during deployment on multi-workers training → tf.distribute.MultiWorkerMirroredStrategy.
My environment: Unbuntu 18.04, tf_version 2.7, 16 -A100 GPUs

Screen Shot 2022-02-17 at 9.52.10 AM|690x366

It seems to be the solution is download Nvidia NCCL library for communication method ??
Hope anyone can answer this question or suggest any ideas →