Help_request Training terminated after few minutes

i everyone,

I have a deep learning code, it works great in pycharm without any problem.
But when I run the code from the terminal and add it to systemed (to work without being needed to be visible), the code terminated after a few minutes.
I can’t figure out what is the problem, any help is appreciated.

I’m using Python 3.9.4
Tensorflow 2.9.1

link for a screenshot of the problem, I can’t embed a photo in this post!

Now I got another problem when running inside pycharm!
The error code is not clear to me.

23592/23592 [==============================] - 19454s 824ms/step - loss: 0.0160 - accuracy: 0.9979 - precision: 0.0000e+00 - recall: 0.0000e+00 - false_negatives: 1554.0000 - false_positives: 27.0000 - true_negatives: 753363.0000 - true_positives: 0.0000e+00
Epoch 2/5
23592/23592 [==============================] - 19421s 823ms/step - loss: 0.0150 - accuracy: 0.9979 - precision: 0.0000e+00 - recall: 0.0000e+00 - false_negatives: 1558.0000 - false_positives: 0.0000e+00 - true_negatives: 753386.0000 - true_positives: 0.0000e+00
Epoch 3/5
 2542/23592 [==>...........................] - ETA: 4:50:03 - loss: 0.0142 - accuracy: 0.9980 - precision: 0.0000e+00 - recall: 0.0000e+00 - false_negatives: 164.0000 - false_positives: 0.0000e+00 - true_negatives: 81180.0000 - true_positives: 0.0000e+00Traceback (most recent call last):
  File "/usr/lib/python3.9/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/home/mustafa/.pycharm_helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/home/mustafa/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/mustafa/project/gemerator.py", line 143, in <module>
    history = model.fit(train_gen, epochs=5)
  File "/home/mustafa/.local/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/mustafa/.local/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:
Detected at node 'assert_greater_equal/Assert/AssertGuard/Assert' defined at (most recent call last):
    File "/home/mustafa/.pycharm_helpers/pydev/pydevconsole.py", line 509, in <module>
      pydevconsole.start_client(host, port)
    File "/home/mustafa/.pycharm_helpers/pydev/pydevconsole.py", line 437, in start_client
      process_exec_queue(interpreter)
    File "/home/mustafa/.pycharm_helpers/pydev/pydevconsole.py", line 284, in process_exec_queue
      interpreter.add_exec(code_fragment)
    File "/home/mustafa/.pycharm_helpers/pydev/_pydev_bundle/pydev_code_executor.py", line 109, in add_exec
      more, exception_occurred = self.do_add_exec(code_fragment)
    File "/home/mustafa/.pycharm_helpers/pydev/pydevconsole.py", line 90, in do_add_exec
      command.run()
    File "/home/mustafa/.pycharm_helpers/pydev/_pydev_bundle/pydev_console_types.py", line 35, in run
      self.more = self.interpreter.runsource(text, '<input>', symbol)
    File "/usr/lib/python3.9/code.py", line 74, in runsource
      self.runcode(code)
    File "/usr/lib/python3.9/code.py", line 90, in runcode
      exec(code, self.locals)
    File "<input>", line 1, in <module>
    File "/home/mustafa/.pycharm_helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
      pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
    File "/home/mustafa/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
      exec(compile(contents+"\n", file, 'exec'), glob, loc)
    File "/home/mustafa/project/gemerator.py", line 143, in <module>
      history = model.fit(train_gen, epochs=5)
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/engine/training.py", line 1409, in fit
      tmp_logs = self.train_function(iterator)
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/engine/training.py", line 1051, in train_function
      return step_function(self, iterator)
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/engine/training.py", line 1040, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/engine/training.py", line 1030, in run_step
      outputs = model.train_step(data)
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/engine/training.py", line 894, in train_step
      return self.compute_metrics(x, y, y_pred, sample_weight)
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/engine/training.py", line 987, in compute_metrics
      self.compiled_metrics.update_state(y, y_pred, sample_weight)
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/engine/compile_utils.py", line 501, in update_state
      metric_obj.update_state(y_t, y_p, sample_weight=mask)
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/utils/metrics_utils.py", line 70, in decorated
      update_op = update_state_fn(*args, **kwargs)
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/metrics/base_metric.py", line 140, in update_state_fn
      return ag_update_state(*args, **kwargs)
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/metrics/metrics.py", line 818, in update_state
      return metrics_utils.update_confusion_matrix_variables(
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/utils/metrics_utils.py", line 602, in update_confusion_matrix_variables
      tf.debugging.assert_greater_equal(
Node: 'assert_greater_equal/Assert/AssertGuard/Assert'
Detected at node 'assert_greater_equal/Assert/AssertGuard/Assert' defined at (most recent call last):
    File "/home/mustafa/.pycharm_helpers/pydev/pydevconsole.py", line 509, in <module>
      pydevconsole.start_client(host, port)
    File "/home/mustafa/.pycharm_helpers/pydev/pydevconsole.py", line 437, in start_client
      process_exec_queue(interpreter)
    File "/home/mustafa/.pycharm_helpers/pydev/pydevconsole.py", line 284, in process_exec_queue
      interpreter.add_exec(code_fragment)
    File "/home/mustafa/.pycharm_helpers/pydev/_pydev_bundle/pydev_code_executor.py", line 109, in add_exec
      more, exception_occurred = self.do_add_exec(code_fragment)
    File "/home/mustafa/.pycharm_helpers/pydev/pydevconsole.py", line 90, in do_add_exec
      command.run()
    File "/home/mustafa/.pycharm_helpers/pydev/_pydev_bundle/pydev_console_types.py", line 35, in run
      self.more = self.interpreter.runsource(text, '<input>', symbol)
    File "/usr/lib/python3.9/code.py", line 74, in runsource
      self.runcode(code)
    File "/usr/lib/python3.9/code.py", line 90, in runcode
      exec(code, self.locals)
    File "<input>", line 1, in <module>
    File "/home/mustafa/.pycharm_helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
      pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
    File "/home/mustafa/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
      exec(compile(contents+"\n", file, 'exec'), glob, loc)
    File "/home/mustafa/project/gemerator.py", line 143, in <module>
      history = model.fit(train_gen, epochs=5)
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/engine/training.py", line 1409, in fit
      tmp_logs = self.train_function(iterator)
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/engine/training.py", line 1051, in train_function
      return step_function(self, iterator)
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/engine/training.py", line 1040, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/engine/training.py", line 1030, in run_step
      outputs = model.train_step(data)
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/engine/training.py", line 894, in train_step
      return self.compute_metrics(x, y, y_pred, sample_weight)
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/engine/training.py", line 987, in compute_metrics
      self.compiled_metrics.update_state(y, y_pred, sample_weight)
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/engine/compile_utils.py", line 501, in update_state
      metric_obj.update_state(y_t, y_p, sample_weight=mask)
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/utils/metrics_utils.py", line 70, in decorated
      update_op = update_state_fn(*args, **kwargs)
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/metrics/base_metric.py", line 140, in update_state_fn
      return ag_update_state(*args, **kwargs)
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/metrics/metrics.py", line 818, in update_state
      return metrics_utils.update_confusion_matrix_variables(
    File "/home/mustafa/.local/lib/python3.9/site-packages/keras/utils/metrics_utils.py", line 602, in update_confusion_matrix_variables
      tf.debugging.assert_greater_equal(
Node: 'assert_greater_equal/Assert/AssertGuard/Assert'
2 root error(s) found.
  (0) INVALID_ARGUMENT:  assertion failed: [predictions must be >= 0] [Condition x >= y did not hold element-wise:] [x (sequential/dense_1/Sigmoid:0) = ] [[nan][nan][nan]...] [y (Cast_6/x:0) = ] [0]
	 [[{{node assert_greater_equal/Assert/AssertGuard/Assert}}]]
	 [[assert_less_equal_5/Assert/AssertGuard/pivot_f/_113/_219]]
  (1) INVALID_ARGUMENT:  assertion failed: [predictions must be >= 0] [Condition x >= y did not hold element-wise:] [x (sequential/dense_1/Sigmoid:0) = ] [[nan][nan][nan]...] [y (Cast_6/x:0) = ] [0]
	 [[{{node assert_greater_equal/Assert/AssertGuard/Assert}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_3092]

I guess there’s some issues with the dense layer you are using, maybe the inputs are giving some wrong outputs

maybe try to run the layers on some of your data to see anything weird

sorry for the non-answer

The problem was solved by updating all TensorFlow related libraries to (2.11.0).
I found that Keras and TensorFlow were on different versions, maybe that was the problem!