Total loss is increasing to 10 digits after some steps

Hi Team,

I’m trying to develop a license plate detection model using object_detection. My dataset consists of train and eval sets. Total images of around 200. this is the config file that I’m using. I have only one class to detect “licensePlate”.

# SSD with Mobilenet v2
# Trained on COCO17, initialized from Imagenet classification checkpoint
# Train on TPU-8
#
# Achieves 22.2 mAP on COCO17 Val

model {
  ssd {
    inplace_batchnorm_update: true
    freeze_batchnorm: false
    num_classes: 1
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    encode_background_as_zeros: true
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.8
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        class_prediction_bias_init: -4.6
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            random_normal_initializer {
              stddev: 0.01
              mean: 0.0
            }
          }
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.97,
            epsilon: 0.001,
          }
        }
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v2_keras'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.97,
          epsilon: 0.001,
        }
      }
      override_base_feature_extractor_hyperparams: true
    }
    loss {
      classification_loss {
        weighted_sigmoid_focal {
          alpha: 0.75,
          gamma: 2.0
        }
      }
      localization_loss {
        weighted_smooth_l1 {
          delta: 1.0
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    normalize_loc_loss_by_codesize: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 1
        max_total_detections: 1
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  fine_tune_checkpoint_version: V2
  fine_tune_checkpoint: "../../../data/plate_detection/models/checkpoint/ckpt-0"
  fine_tune_checkpoint_type: "detection"
  batch_size: 16
  sync_replicas: true
  startup_delay_steps: 0
  replicas_to_aggregate: 8
  num_steps: 50000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: .8
          total_steps: 50000
          warmup_learning_rate: 0.13333
          warmup_steps: 2000
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  max_number_of_boxes: 1
  unpad_groundtruth_tensors: false
}

train_input_reader: {
  label_map_path: "../../../data/plate_detection/label_map.pbtxt"
  tf_record_input_reader {
    input_path: "../../../data/plate_detection/train.tfrecord"
  }
}

eval_config: {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
}

eval_input_reader: {
  label_map_path: "../../../data/plate_detection/label_map.pbtxt"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "../../../data/plate_detection/eval.tfrecord"
  }
}

when I run this command

!python model_main_tf2.py --pipeline_config_path={PIPELINE_CONFING_FILEPATH} --model_dir={CHECKPOINTS_DIR} --sample_1_of_n_eval_examples=40 --checkpoint_every_n=100 --alsologtostderr

this is the o/p that I’m getting

2023-01-12 14:36:17.174207: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-12 14:36:17.264609: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-01-12 14:36:17.796095: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-01-12 14:36:17.796134: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-01-12 14:36:17.796139: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-01-12 14:36:25.132688: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_SYSTEM_DRIVER_MISMATCH: system has unsupported display driver / cuda driver combination
2023-01-12 14:36:25.132759: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: LeeAarthi
2023-01-12 14:36:25.132780: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: LeeAarthi
2023-01-12 14:36:25.133038: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 510.108.3
2023-01-12 14:36:25.133080: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 515.86.1
2023-01-12 14:36:25.133097: E tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:313] kernel version 515.86.1 does not match DSO version 510.108.3 -- cannot find working devices in this configuration
2023-01-12 14:36:25.149223: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING:tensorflow:There are non-GPU devices in `tf.distribute.Strategy`, not using nccl allreduce.
W0112 14:36:25.200132 139656852170560 cross_device_ops.py:1387] There are non-GPU devices in `tf.distribute.Strategy`, not using nccl allreduce.
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)
I0112 14:36:25.866486 139656852170560 mirrored_strategy.py:374] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)
INFO:tensorflow:Maybe overwriting train_steps: None
I0112 14:36:25.873561 139656852170560 config_util.py:552] Maybe overwriting train_steps: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0112 14:36:25.873781 139656852170560 config_util.py:552] Maybe overwriting use_bfloat16: False
WARNING:tensorflow:From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/model_lib_v2.py:563: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
W0112 14:36:25.923964 139656852170560 deprecation.py:350] From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/model_lib_v2.py:563: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
INFO:tensorflow:Reading unweighted datasets: ['../../../data/plate_detection/train.tfrecord']
I0112 14:36:26.005203 139656852170560 dataset_builder.py:162] Reading unweighted datasets: ['../../../data/plate_detection/train.tfrecord']
INFO:tensorflow:Reading record datasets for input file: ['../../../data/plate_detection/train.tfrecord']
I0112 14:36:26.005551 139656852170560 dataset_builder.py:79] Reading record datasets for input file: ['../../../data/plate_detection/train.tfrecord']
INFO:tensorflow:Number of filenames to read: 1
I0112 14:36:26.005683 139656852170560 dataset_builder.py:80] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0112 14:36:26.005786 139656852170560 dataset_builder.py:86] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.deterministic`.
W0112 14:36:26.055186 139656852170560 deprecation.py:350] From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.deterministic`.
WARNING:tensorflow:From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/builders/dataset_builder.py:235: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0112 14:36:26.207533 139656852170560 deprecation.py:350] From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/builders/dataset_builder.py:235: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.
Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089
W0112 14:36:26.561252 139656852170560 deprecation.py:350] From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.
Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089
WARNING:tensorflow:From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0112 14:36:30.074060 139656852170560 deprecation.py:350] From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
W0112 14:36:31.960334 139656852170560 deprecation.py:350] From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
WARNING:tensorflow:From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0112 14:36:33.624126 139656852170560 deprecation.py:350] From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
2023-01-12 14:36:37.406115: W tensorflow/core/framework/dataset.cc:769] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
/home/lee/.local/lib/python3.10/site-packages/keras/backend.py:451: UserWarning: `tf.keras.backend.set_learning_phase` is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
  warnings.warn(
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 14:36:42.034106 139653025863232 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 14:36:42.034290 139653025863232 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 14:36:42.034357 139653025863232 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 14:36:42.034413 139653025863232 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 14:36:42.034463 139653025863232 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 14:36:42.034514 139653025863232 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
WARNING:tensorflow:From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/deprecation.py:629: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead
W0112 14:36:55.869040 139652774213184 deprecation.py:554] From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/deprecation.py:629: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead
INFO:tensorflow:Step 100 per-step time 1.134s
I0112 14:38:49.012942 139656852170560 model_lib_v2.py:705] Step 100 per-step time 1.134s
INFO:tensorflow:{'Loss/classification_loss': 1.5236183,
 'Loss/localization_loss': 0.6281918,
 'Loss/regularization_loss': 1.5228351,
 'Loss/total_loss': 3.6746454,
 'learning_rate': 0.1666635}
I0112 14:38:49.013182 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 1.5236183,
 'Loss/localization_loss': 0.6281918,
 'Loss/regularization_loss': 1.5228351,
 'Loss/total_loss': 3.6746454,
 'learning_rate': 0.1666635}
INFO:tensorflow:Step 200 per-step time 0.917s
I0112 14:40:20.684302 139656852170560 model_lib_v2.py:705] Step 200 per-step time 0.917s
INFO:tensorflow:{'Loss/classification_loss': 1.008276,
 'Loss/localization_loss': 0.6590053,
 'Loss/regularization_loss': 1.5134761,
 'Loss/total_loss': 3.1807575,
 'learning_rate': 0.19999701}
I0112 14:40:20.684490 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 1.008276,
 'Loss/localization_loss': 0.6590053,
 'Loss/regularization_loss': 1.5134761,
 'Loss/total_loss': 3.1807575,
 'learning_rate': 0.19999701}
INFO:tensorflow:Step 300 per-step time 0.907s
I0112 14:41:51.405723 139656852170560 model_lib_v2.py:705] Step 300 per-step time 0.907s
INFO:tensorflow:{'Loss/classification_loss': 0.9540243,
 'Loss/localization_loss': 0.8763988,
 'Loss/regularization_loss': 1.4900168,
 'Loss/total_loss': 3.3204398,
 'learning_rate': 0.23333052}
I0112 14:41:51.405892 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 0.9540243,
 'Loss/localization_loss': 0.8763988,
 'Loss/regularization_loss': 1.4900168,
 'Loss/total_loss': 3.3204398,
 'learning_rate': 0.23333052}
INFO:tensorflow:Step 400 per-step time 0.911s
I0112 14:43:22.549281 139656852170560 model_lib_v2.py:705] Step 400 per-step time 0.911s
INFO:tensorflow:{'Loss/classification_loss': 0.5288713,
 'Loss/localization_loss': 0.4404268,
 'Loss/regularization_loss': 1.4639237,
 'Loss/total_loss': 2.4332218,
 'learning_rate': 0.26666403}
I0112 14:43:22.549448 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 0.5288713,
 'Loss/localization_loss': 0.4404268,
 'Loss/regularization_loss': 1.4639237,
 'Loss/total_loss': 2.4332218,
 'learning_rate': 0.26666403}
INFO:tensorflow:Step 500 per-step time 0.905s
I0112 14:44:53.045037 139656852170560 model_lib_v2.py:705] Step 500 per-step time 0.905s
INFO:tensorflow:{'Loss/classification_loss': 0.5595218,
 'Loss/localization_loss': 0.4022845,
 'Loss/regularization_loss': 1.4353888,
 'Loss/total_loss': 2.397195,
 'learning_rate': 0.2999975}
I0112 14:44:53.045216 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 0.5595218,
 'Loss/localization_loss': 0.4022845,
 'Loss/regularization_loss': 1.4353888,
 'Loss/total_loss': 2.397195,
 'learning_rate': 0.2999975}
INFO:tensorflow:Step 600 per-step time 0.909s
I0112 14:46:23.993309 139656852170560 model_lib_v2.py:705] Step 600 per-step time 0.909s
INFO:tensorflow:{'Loss/classification_loss': 0.2747093,
 'Loss/localization_loss': 0.41346416,
 'Loss/regularization_loss': 1.4042052,
 'Loss/total_loss': 2.0923786,
 'learning_rate': 0.33333102}
I0112 14:46:23.993470 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 0.2747093,
 'Loss/localization_loss': 0.41346416,
 'Loss/regularization_loss': 1.4042052,
 'Loss/total_loss': 2.0923786,
 'learning_rate': 0.33333102}
INFO:tensorflow:Step 700 per-step time 0.909s
I0112 14:47:54.858499 139656852170560 model_lib_v2.py:705] Step 700 per-step time 0.909s
INFO:tensorflow:{'Loss/classification_loss': 0.33462483,
 'Loss/localization_loss': 0.35814464,
 'Loss/regularization_loss': 1.371152,
 'Loss/total_loss': 2.0639215,
 'learning_rate': 0.36666453}
I0112 14:47:54.858674 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 0.33462483,
 'Loss/localization_loss': 0.35814464,
 'Loss/regularization_loss': 1.371152,
 'Loss/total_loss': 2.0639215,
 'learning_rate': 0.36666453}
INFO:tensorflow:Step 800 per-step time 0.905s
I0112 14:49:25.337317 139656852170560 model_lib_v2.py:705] Step 800 per-step time 0.905s
INFO:tensorflow:{'Loss/classification_loss': 0.35420462,
 'Loss/localization_loss': 0.55635065,
 'Loss/regularization_loss': 1.3369604,
 'Loss/total_loss': 2.2475157,
 'learning_rate': 0.399998}
I0112 14:49:25.337480 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 0.35420462,
 'Loss/localization_loss': 0.55635065,
 'Loss/regularization_loss': 1.3369604,
 'Loss/total_loss': 2.2475157,
 'learning_rate': 0.399998}
INFO:tensorflow:Step 900 per-step time 0.911s
I0112 14:50:56.446939 139656852170560 model_lib_v2.py:705] Step 900 per-step time 0.911s
INFO:tensorflow:{'Loss/classification_loss': 0.33083853,
 'Loss/localization_loss': 0.3767396,
 'Loss/regularization_loss': 1.2980688,
 'Loss/total_loss': 2.005647,
 'learning_rate': 0.4333315}
I0112 14:50:56.447100 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 0.33083853,
 'Loss/localization_loss': 0.3767396,
 'Loss/regularization_loss': 1.2980688,
 'Loss/total_loss': 2.005647,
 'learning_rate': 0.4333315}
INFO:tensorflow:Step 1000 per-step time 0.909s
I0112 14:52:27.391081 139656852170560 model_lib_v2.py:705] Step 1000 per-step time 0.909s
INFO:tensorflow:{'Loss/classification_loss': 0.23829985,
 'Loss/localization_loss': 0.388591,
 'Loss/regularization_loss': 1.2621,
 'Loss/total_loss': 1.8889909,
 'learning_rate': 0.46666503}
I0112 14:52:27.391244 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 0.23829985,
 'Loss/localization_loss': 0.388591,
 'Loss/regularization_loss': 1.2621,
 'Loss/total_loss': 1.8889909,
 'learning_rate': 0.46666503}
INFO:tensorflow:Step 1100 per-step time 0.907s
I0112 14:53:58.068579 139656852170560 model_lib_v2.py:705] Step 1100 per-step time 0.907s
INFO:tensorflow:{'Loss/classification_loss': 0.35621598,
 'Loss/localization_loss': 0.24143055,
 'Loss/regularization_loss': 1.5909076,
 'Loss/total_loss': 2.188554,
 'learning_rate': 0.4999985}
I0112 14:53:58.068754 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 0.35621598,
 'Loss/localization_loss': 0.24143055,
 'Loss/regularization_loss': 1.5909076,
 'Loss/total_loss': 2.188554,
 'learning_rate': 0.4999985}
INFO:tensorflow:Step 1200 per-step time 0.906s
I0112 14:55:28.655671 139656852170560 model_lib_v2.py:705] Step 1200 per-step time 0.906s
INFO:tensorflow:{'Loss/classification_loss': 0.25968555,
 'Loss/localization_loss': 0.33683094,
 'Loss/regularization_loss': 1.5340892,
 'Loss/total_loss': 2.1306057,
 'learning_rate': 0.53333205}
I0112 14:55:28.655854 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 0.25968555,
 'Loss/localization_loss': 0.33683094,
 'Loss/regularization_loss': 1.5340892,
 'Loss/total_loss': 2.1306057,
 'learning_rate': 0.53333205}
INFO:tensorflow:Step 1300 per-step time 0.900s
I0112 14:56:58.697028 139656852170560 model_lib_v2.py:705] Step 1300 per-step time 0.900s
INFO:tensorflow:{'Loss/classification_loss': 1.4101561,
 'Loss/localization_loss': 0.62371314,
 'Loss/regularization_loss': 1.7546113,
 'Loss/total_loss': 3.7884803,
 'learning_rate': 0.56666553}
I0112 14:56:58.697187 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 1.4101561,
 'Loss/localization_loss': 0.62371314,
 'Loss/regularization_loss': 1.7546113,
 'Loss/total_loss': 3.7884803,
 'learning_rate': 0.56666553}
INFO:tensorflow:Step 1400 per-step time 0.899s
I0112 14:58:28.599185 139656852170560 model_lib_v2.py:705] Step 1400 per-step time 0.899s
INFO:tensorflow:{'Loss/classification_loss': 8677.731,
 'Loss/localization_loss': 49.70236,
 'Loss/regularization_loss': 2980859000.0,
 'Loss/total_loss': 2980867600.0,
 'learning_rate': 0.599999}
I0112 14:58:28.599374 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 8677.731,
 'Loss/localization_loss': 49.70236,
 'Loss/regularization_loss': 2980859000.0,
 'Loss/total_loss': 2980867600.0,
 'learning_rate': 0.599999}
^C

I stopped my run after the total loss increased. Can someone please tell where am I making a mistake, I’m doing my final semester project any help would be much appreciated as I’m stuck in this step.

Also, I tried with a different dataset having around 1000 images, where the training and eval set ratio is 7:3, below is the o/p that I’m getting.

2023-01-12 15:40:39.958148: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-12 15:40:40.048710: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-01-12 15:40:40.596023: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-01-12 15:40:40.596071: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-01-12 15:40:40.596077: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-01-12 15:40:41.725431: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_SYSTEM_DRIVER_MISMATCH: system has unsupported display driver / cuda driver combination
2023-01-12 15:40:41.725463: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: LeeAarthi
2023-01-12 15:40:41.725470: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: LeeAarthi
2023-01-12 15:40:41.725614: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 510.108.3
2023-01-12 15:40:41.725629: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 515.86.1
2023-01-12 15:40:41.725635: E tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:313] kernel version 515.86.1 does not match DSO version 510.108.3 -- cannot find working devices in this configuration
2023-01-12 15:40:41.726004: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING:tensorflow:There are non-GPU devices in `tf.distribute.Strategy`, not using nccl allreduce.
W0112 15:40:41.726858 140015243441984 cross_device_ops.py:1387] There are non-GPU devices in `tf.distribute.Strategy`, not using nccl allreduce.
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)
I0112 15:40:41.745393 140015243441984 mirrored_strategy.py:374] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)
INFO:tensorflow:Maybe overwriting train_steps: None
I0112 15:40:41.747863 140015243441984 config_util.py:552] Maybe overwriting train_steps: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0112 15:40:41.747941 140015243441984 config_util.py:552] Maybe overwriting use_bfloat16: False
WARNING:tensorflow:From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/model_lib_v2.py:563: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
W0112 15:40:41.777721 140015243441984 deprecation.py:350] From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/model_lib_v2.py:563: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
INFO:tensorflow:Reading unweighted datasets: ['../../../data/plate_detection/train.tfrecord']
I0112 15:40:41.784984 140015243441984 dataset_builder.py:162] Reading unweighted datasets: ['../../../data/plate_detection/train.tfrecord']
INFO:tensorflow:Reading record datasets for input file: ['../../../data/plate_detection/train.tfrecord']
I0112 15:40:41.785113 140015243441984 dataset_builder.py:79] Reading record datasets for input file: ['../../../data/plate_detection/train.tfrecord']
INFO:tensorflow:Number of filenames to read: 1
I0112 15:40:41.785149 140015243441984 dataset_builder.py:80] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0112 15:40:41.785178 140015243441984 dataset_builder.py:86] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.deterministic`.
W0112 15:40:41.794954 140015243441984 deprecation.py:350] From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.deterministic`.
WARNING:tensorflow:From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/builders/dataset_builder.py:235: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0112 15:40:41.806131 140015243441984 deprecation.py:350] From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/builders/dataset_builder.py:235: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.
Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089
W0112 15:40:42.132234 140015243441984 deprecation.py:350] From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.
Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089
WARNING:tensorflow:From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0112 15:40:45.451416 140015243441984 deprecation.py:350] From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
W0112 15:40:47.251003 140015243441984 deprecation.py:350] From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
WARNING:tensorflow:From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0112 15:40:48.657864 140015243441984 deprecation.py:350] From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
2023-01-12 15:40:50.360564: W tensorflow/core/framework/dataset.cc:769] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
/home/lee/.local/lib/python3.10/site-packages/keras/backend.py:451: UserWarning: `tf.keras.backend.set_learning_phase` is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
  warnings.warn(
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 15:40:54.365942 140011143939648 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 15:40:54.366105 140011143939648 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 15:40:54.366169 140011143939648 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 15:40:54.366216 140011143939648 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 15:40:54.366261 140011143939648 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 15:40:54.366318 140011143939648 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
WARNING:tensorflow:From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/deprecation.py:629: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead
W0112 15:41:05.965058 140010347025984 deprecation.py:554] From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/deprecation.py:629: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead
INFO:tensorflow:Step 100 per-step time 1.516s
I0112 15:43:37.426495 140015243441984 model_lib_v2.py:705] Step 100 per-step time 1.516s
INFO:tensorflow:{'Loss/classification_loss': 1.0654833,
 'Loss/localization_loss': 0.5987989,
 'Loss/regularization_loss': 38.64246,
 'Loss/total_loss': 40.306744,
 'learning_rate': 0.1666635}
I0112 15:43:37.426786 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 1.0654833,
 'Loss/localization_loss': 0.5987989,
 'Loss/regularization_loss': 38.64246,
 'Loss/total_loss': 40.306744,
 'learning_rate': 0.1666635}
INFO:tensorflow:Step 200 per-step time 1.265s
I0112 15:45:43.857209 140015243441984 model_lib_v2.py:705] Step 200 per-step time 1.265s
INFO:tensorflow:{'Loss/classification_loss': 1.4991432,
 'Loss/localization_loss': 0.72100943,
 'Loss/regularization_loss': 38.119114,
 'Loss/total_loss': 40.339264,
 'learning_rate': 0.19999701}
I0112 15:45:43.857413 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 1.4991432,
 'Loss/localization_loss': 0.72100943,
 'Loss/regularization_loss': 38.119114,
 'Loss/total_loss': 40.339264,
 'learning_rate': 0.19999701}
INFO:tensorflow:Step 300 per-step time 1.267s
I0112 15:47:50.604677 140015243441984 model_lib_v2.py:705] Step 300 per-step time 1.267s
INFO:tensorflow:{'Loss/classification_loss': 1.0842727,
 'Loss/localization_loss': 0.6753453,
 'Loss/regularization_loss': 37.475296,
 'Loss/total_loss': 39.234917,
 'learning_rate': 0.23333052}
I0112 15:47:50.604861 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 1.0842727,
 'Loss/localization_loss': 0.6753453,
 'Loss/regularization_loss': 37.475296,
 'Loss/total_loss': 39.234917,
 'learning_rate': 0.23333052}
INFO:tensorflow:Step 400 per-step time 1.265s
I0112 15:49:57.122525 140015243441984 model_lib_v2.py:705] Step 400 per-step time 1.265s
INFO:tensorflow:{'Loss/classification_loss': 0.8546873,
 'Loss/localization_loss': 0.55390483,
 'Loss/regularization_loss': 36.74987,
 'Loss/total_loss': 38.158463,
 'learning_rate': 0.26666403}
I0112 15:49:57.122707 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 0.8546873,
 'Loss/localization_loss': 0.55390483,
 'Loss/regularization_loss': 36.74987,
 'Loss/total_loss': 38.158463,
 'learning_rate': 0.26666403}
INFO:tensorflow:Step 500 per-step time 1.266s
I0112 15:52:03.695945 140015243441984 model_lib_v2.py:705] Step 500 per-step time 1.266s
INFO:tensorflow:{'Loss/classification_loss': 0.82560086,
 'Loss/localization_loss': 0.56257683,
 'Loss/regularization_loss': 35.94078,
 'Loss/total_loss': 37.328957,
 'learning_rate': 0.2999975}
I0112 15:52:03.696129 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 0.82560086,
 'Loss/localization_loss': 0.56257683,
 'Loss/regularization_loss': 35.94078,
 'Loss/total_loss': 37.328957,
 'learning_rate': 0.2999975}
INFO:tensorflow:Step 600 per-step time 1.265s
I0112 15:54:10.212975 140015243441984 model_lib_v2.py:705] Step 600 per-step time 1.265s
INFO:tensorflow:{'Loss/classification_loss': 0.90044135,
 'Loss/localization_loss': 0.6134614,
 'Loss/regularization_loss': 35.108566,
 'Loss/total_loss': 36.622467,
 'learning_rate': 0.33333102}
I0112 15:54:10.213164 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 0.90044135,
 'Loss/localization_loss': 0.6134614,
 'Loss/regularization_loss': 35.108566,
 'Loss/total_loss': 36.622467,
 'learning_rate': 0.33333102}
INFO:tensorflow:Step 700 per-step time 1.264s
I0112 15:56:16.611257 140015243441984 model_lib_v2.py:705] Step 700 per-step time 1.264s
INFO:tensorflow:{'Loss/classification_loss': 0.70723003,
 'Loss/localization_loss': 0.60727644,
 'Loss/regularization_loss': 34.154346,
 'Loss/total_loss': 35.468853,
 'learning_rate': 0.36666453}
I0112 15:56:16.611439 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 0.70723003,
 'Loss/localization_loss': 0.60727644,
 'Loss/regularization_loss': 34.154346,
 'Loss/total_loss': 35.468853,
 'learning_rate': 0.36666453}
INFO:tensorflow:Step 800 per-step time 1.265s
I0112 15:58:23.133450 140015243441984 model_lib_v2.py:705] Step 800 per-step time 1.265s
INFO:tensorflow:{'Loss/classification_loss': 1.0843798,
 'Loss/localization_loss': 0.7819879,
 'Loss/regularization_loss': 33.14651,
 'Loss/total_loss': 35.01288,
 'learning_rate': 0.399998}
I0112 15:58:23.133637 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 1.0843798,
 'Loss/localization_loss': 0.7819879,
 'Loss/regularization_loss': 33.14651,
 'Loss/total_loss': 35.01288,
 'learning_rate': 0.399998}
INFO:tensorflow:Step 900 per-step time 1.268s
I0112 16:00:29.977104 140015243441984 model_lib_v2.py:705] Step 900 per-step time 1.268s
INFO:tensorflow:{'Loss/classification_loss': 0.80963266,
 'Loss/localization_loss': 0.6053904,
 'Loss/regularization_loss': 32.171913,
 'Loss/total_loss': 33.586933,
 'learning_rate': 0.4333315}
I0112 16:00:29.977288 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 0.80963266,
 'Loss/localization_loss': 0.6053904,
 'Loss/regularization_loss': 32.171913,
 'Loss/total_loss': 33.586933,
 'learning_rate': 0.4333315}
INFO:tensorflow:Step 1000 per-step time 1.261s
I0112 16:02:36.109382 140015243441984 model_lib_v2.py:705] Step 1000 per-step time 1.261s
INFO:tensorflow:{'Loss/classification_loss': 137.5638,
 'Loss/localization_loss': 3.1618676,
 'Loss/regularization_loss': 1370.6179,
 'Loss/total_loss': 1511.3436,
 'learning_rate': 0.46666503}
I0112 16:02:36.109562 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 137.5638,
 'Loss/localization_loss': 3.1618676,
 'Loss/regularization_loss': 1370.6179,
 'Loss/total_loss': 1511.3436,
 'learning_rate': 0.46666503}

skipped some steps due to space constraint.

INFO:tensorflow:Step 3600 per-step time 1.251s
I0112 16:56:52.705066 140015243441984 model_lib_v2.py:705] Step 3600 per-step time 1.251s
INFO:tensorflow:{'Loss/classification_loss': 8528.943,
 'Loss/localization_loss': 47.016922,
 'Loss/regularization_loss': 14735.664,
 'Loss/total_loss': 23311.625,
 'learning_rate': 0.79780877}
I0112 16:56:52.705242 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 8528.943,
 'Loss/localization_loss': 47.016922,
 'Loss/regularization_loss': 14735.664,
 'Loss/total_loss': 23311.625,
 'learning_rate': 0.79780877}
INFO:tensorflow:Step 3700 per-step time 1.252s
I0112 16:58:57.892563 140015243441984 model_lib_v2.py:705] Step 3700 per-step time 1.252s
INFO:tensorflow:{'Loss/classification_loss': 13402.302,
 'Loss/localization_loss': 30.94923,
 'Loss/regularization_loss': 13804.406,
 'Loss/total_loss': 27237.656,
 'learning_rate': 0.79752654}
I0112 16:58:57.892744 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 13402.302,
 'Loss/localization_loss': 30.94923,
 'Loss/regularization_loss': 13804.406,
 'Loss/total_loss': 27237.656,
 'learning_rate': 0.79752654}
INFO:tensorflow:Step 3800 per-step time 1.248s
I0112 17:01:02.741252 140015243441984 model_lib_v2.py:705] Step 3800 per-step time 1.248s
INFO:tensorflow:{'Loss/classification_loss': 12479.815,
 'Loss/localization_loss': 40.673664,
 'Loss/regularization_loss': 12958.073,
 'Loss/total_loss': 25478.562,
 'learning_rate': 0.7972274}
I0112 17:01:02.741428 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 12479.815,
 'Loss/localization_loss': 40.673664,
 'Loss/regularization_loss': 12958.073,
 'Loss/total_loss': 25478.562,
 'learning_rate': 0.7972274}
INFO:tensorflow:Step 3900 per-step time 1.251s
I0112 17:03:07.888379 140015243441984 model_lib_v2.py:705] Step 3900 per-step time 1.251s
INFO:tensorflow:{'Loss/classification_loss': 4355.9683,
 'Loss/localization_loss': 16.196148,
 'Loss/regularization_loss': 12090.652,
 'Loss/total_loss': 16462.816,
 'learning_rate': 0.7969112}
I0112 17:03:07.888562 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 4355.9683,
 'Loss/localization_loss': 16.196148,
 'Loss/regularization_loss': 12090.652,
 'Loss/total_loss': 16462.816,
 'learning_rate': 0.7969112}
INFO:tensorflow:Step 4000 per-step time 1.252s
I0112 17:05:13.094829 140015243441984 model_lib_v2.py:705] Step 4000 per-step time 1.252s
INFO:tensorflow:{'Loss/classification_loss': 12362.029,
 'Loss/localization_loss': 8.380647,
 'Loss/regularization_loss': 11670.155,
 'Loss/total_loss': 24040.564,
 'learning_rate': 0.79657793}
I0112 17:05:13.095010 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 12362.029,
 'Loss/localization_loss': 8.380647,
 'Loss/regularization_loss': 11670.155,
 'Loss/total_loss': 24040.564,
 'learning_rate': 0.79657793}
INFO:tensorflow:Step 4100 per-step time 1.253s
I0112 17:07:18.401231 140015243441984 model_lib_v2.py:705] Step 4100 per-step time 1.253s
INFO:tensorflow:{'Loss/classification_loss': 7558.982,
 'Loss/localization_loss': 12.8216,
 'Loss/regularization_loss': 10928.576,
 'Loss/total_loss': 18500.38,
 'learning_rate': 0.79622775}
I0112 17:07:18.401425 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 7558.982,
 'Loss/localization_loss': 12.8216,
 'Loss/regularization_loss': 10928.576,
 'Loss/total_loss': 18500.38,
 'learning_rate': 0.79622775}
INFO:tensorflow:Step 4200 per-step time 1.252s
I0112 17:09:23.593532 140015243441984 model_lib_v2.py:705] Step 4200 per-step time 1.252s
INFO:tensorflow:{'Loss/classification_loss': 12317.455,
 'Loss/localization_loss': 36.416595,
 'Loss/regularization_loss': 10190.096,
 'Loss/total_loss': 22543.969,
 'learning_rate': 0.7958606}
I0112 17:09:23.593710 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 12317.455,
 'Loss/localization_loss': 36.416595,
 'Loss/regularization_loss': 10190.096,
 'Loss/total_loss': 22543.969,
 'learning_rate': 0.7958606}
INFO:tensorflow:Step 4300 per-step time 1.253s
I0112 17:11:28.884701 140015243441984 model_lib_v2.py:705] Step 4300 per-step time 1.253s
INFO:tensorflow:{'Loss/classification_loss': 2621.9905,
 'Loss/localization_loss': 11.7453165,
 'Loss/regularization_loss': 9510.851,
 'Loss/total_loss': 12144.586,
 'learning_rate': 0.79547644}
I0112 17:11:28.884880 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 2621.9905,
 'Loss/localization_loss': 11.7453165,
 'Loss/regularization_loss': 9510.851,
 'Loss/total_loss': 12144.586,
 'learning_rate': 0.79547644}
INFO:tensorflow:Step 4400 per-step time 1.252s
I0112 17:13:34.106339 140015243441984 model_lib_v2.py:705] Step 4400 per-step time 1.252s
INFO:tensorflow:{'Loss/classification_loss': 9142.615,
 'Loss/localization_loss': 24.524694,
 'Loss/regularization_loss': 8922.91,
 'Loss/total_loss': 18090.05,
 'learning_rate': 0.79507536}
I0112 17:13:34.106516 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 9142.615,
 'Loss/localization_loss': 24.524694,
 'Loss/regularization_loss': 8922.91,
 'Loss/total_loss': 18090.05,
 'learning_rate': 0.79507536}
INFO:tensorflow:Step 4500 per-step time 1.250s
I0112 17:15:39.126658 140015243441984 model_lib_v2.py:705] Step 4500 per-step time 1.250s
INFO:tensorflow:{'Loss/classification_loss': 13348.365,
 'Loss/localization_loss': 22.179163,
 'Loss/regularization_loss': 8383.519,
 'Loss/total_loss': 21754.062,
 'learning_rate': 0.79465735}
I0112 17:15:39.126846 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 13348.365,
 'Loss/localization_loss': 22.179163,
 'Loss/regularization_loss': 8383.519,
 'Loss/total_loss': 21754.062,
 'learning_rate': 0.79465735}
INFO:tensorflow:Step 4600 per-step time 1.255s
I0112 17:17:44.612639 140015243441984 model_lib_v2.py:705] Step 4600 per-step time 1.255s
INFO:tensorflow:{'Loss/classification_loss': 8698.898,
 'Loss/localization_loss': 9.890597,
 'Loss/regularization_loss': 7879.993,
 'Loss/total_loss': 16588.781,
 'learning_rate': 0.7942225}
I0112 17:17:44.612818 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 8698.898,
 'Loss/localization_loss': 9.890597,
 'Loss/regularization_loss': 7879.993,
 'Loss/total_loss': 16588.781,
 'learning_rate': 0.7942225}
INFO:tensorflow:Step 4700 per-step time 1.250s
I0112 17:19:49.605818 140015243441984 model_lib_v2.py:705] Step 4700 per-step time 1.250s
INFO:tensorflow:{'Loss/classification_loss': 20935.793,
 'Loss/localization_loss': 19.896955,
 'Loss/regularization_loss': 7369.5054,
 'Loss/total_loss': 28325.195,
 'learning_rate': 0.7937706}
I0112 17:19:49.605993 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 20935.793,
 'Loss/localization_loss': 19.896955,
 'Loss/regularization_loss': 7369.5054,
 'Loss/total_loss': 28325.195,
 'learning_rate': 0.7937706}
INFO:tensorflow:Step 4800 per-step time 1.255s
I0112 17:21:55.071570 140015243441984 model_lib_v2.py:705] Step 4800 per-step time 1.255s
INFO:tensorflow:{'Loss/classification_loss': 2717.7703,
 'Loss/localization_loss': 12.026006,
 'Loss/regularization_loss': 6828.674,
 'Loss/total_loss': 9558.47,
 'learning_rate': 0.793302}
I0112 17:21:55.071758 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 2717.7703,
 'Loss/localization_loss': 12.026006,
 'Loss/regularization_loss': 6828.674,
 'Loss/total_loss': 9558.47,
 'learning_rate': 0.793302}
INFO:tensorflow:Step 4900 per-step time 1.251s
I0112 17:24:00.135459 140015243441984 model_lib_v2.py:705] Step 4900 per-step time 1.251s
INFO:tensorflow:{'Loss/classification_loss': 16445.629,
 'Loss/localization_loss': 10.181732,
 'Loss/regularization_loss': 6469.9517,
 'Loss/total_loss': 22925.762,
 'learning_rate': 0.79281646}
I0112 17:24:00.135655 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 16445.629,
 'Loss/localization_loss': 10.181732,
 'Loss/regularization_loss': 6469.9517,
 'Loss/total_loss': 22925.762,
 'learning_rate': 0.79281646}
INFO:tensorflow:Step 5000 per-step time 1.253s
I0112 17:26:05.422236 140015243441984 model_lib_v2.py:705] Step 5000 per-step time 1.253s
INFO:tensorflow:{'Loss/classification_loss': 11325.532,
 'Loss/localization_loss': 39.508034,
 'Loss/regularization_loss': 6047.702,
 'Loss/total_loss': 17412.742,
 'learning_rate': 0.7923141}
I0112 17:26:05.422418 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 11325.532,
 'Loss/localization_loss': 39.508034,
 'Loss/regularization_loss': 6047.702,
 'Loss/total_loss': 17412.742,
 'learning_rate': 0.7923141}
INFO:tensorflow:Step 5100 per-step time 1.251s
I0112 17:28:10.537145 140015243441984 model_lib_v2.py:705] Step 5100 per-step time 1.251s
INFO:tensorflow:{'Loss/classification_loss': 6021.8306,
 'Loss/localization_loss': 9.935094,
 'Loss/regularization_loss': 5676.9565,
 'Loss/total_loss': 11708.723,
 'learning_rate': 0.79179496}
I0112 17:28:10.537329 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 6021.8306,
 'Loss/localization_loss': 9.935094,
 'Loss/regularization_loss': 5676.9565,
 'Loss/total_loss': 11708.723,
 'learning_rate': 0.79179496}
INFO:tensorflow:Step 5200 per-step time 1.254s
I0112 17:30:15.932446 140015243441984 model_lib_v2.py:705] Step 5200 per-step time 1.254s
INFO:tensorflow:{'Loss/classification_loss': 3318.1829,
 'Loss/localization_loss': 29.817122,
 'Loss/regularization_loss': 5382.4917,
 'Loss/total_loss': 8730.491,
 'learning_rate': 0.79125905}
I0112 17:30:15.932627 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 3318.1829,
 'Loss/localization_loss': 29.817122,
 'Loss/regularization_loss': 5382.4917,
 'Loss/total_loss': 8730.491,
 'learning_rate': 0.79125905}
INFO:tensorflow:Step 5300 per-step time 1.251s
I0112 17:32:21.036635 140015243441984 model_lib_v2.py:705] Step 5300 per-step time 1.251s
INFO:tensorflow:{'Loss/classification_loss': 8942.402,
 'Loss/localization_loss': 19.730413,
 'Loss/regularization_loss': 5201.262,
 'Loss/total_loss': 14163.395,
 'learning_rate': 0.79070634}
I0112 17:32:21.036813 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 8942.402,
 'Loss/localization_loss': 19.730413,
 'Loss/regularization_loss': 5201.262,
 'Loss/total_loss': 14163.395,
 'learning_rate': 0.79070634}
INFO:tensorflow:Step 5400 per-step time 1.252s
I0112 17:34:26.207802 140015243441984 model_lib_v2.py:705] Step 5400 per-step time 1.252s
INFO:tensorflow:{'Loss/classification_loss': 900.0243,
 'Loss/localization_loss': 22.880278,
 'Loss/regularization_loss': 4777.2974,
 'Loss/total_loss': 5700.202,
 'learning_rate': 0.79013693}
I0112 17:34:26.208016 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 900.0243,
 'Loss/localization_loss': 22.880278,
 'Loss/regularization_loss': 4777.2974,
 'Loss/total_loss': 5700.202,
 'learning_rate': 0.79013693}
INFO:tensorflow:Step 5500 per-step time 1.272s
I0112 17:36:33.387695 140015243441984 model_lib_v2.py:705] Step 5500 per-step time 1.272s
INFO:tensorflow:{'Loss/classification_loss': 6431.478,
 'Loss/localization_loss': 40.654152,
 'Loss/regularization_loss': 4468.076,
 'Loss/total_loss': 10940.209,
 'learning_rate': 0.7895508}
I0112 17:36:33.387874 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 6431.478,
 'Loss/localization_loss': 40.654152,
 'Loss/regularization_loss': 4468.076,
 'Loss/total_loss': 10940.209,
 'learning_rate': 0.7895508}
INFO:tensorflow:Step 5600 per-step time 1.255s
I0112 17:38:38.869205 140015243441984 model_lib_v2.py:705] Step 5600 per-step time 1.255s
INFO:tensorflow:{'Loss/classification_loss': 11988.964,
 'Loss/localization_loss': 13.829172,
 'Loss/regularization_loss': 4274.1826,
 'Loss/total_loss': 16276.976,
 'learning_rate': 0.788948}
I0112 17:38:38.869393 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 11988.964,
 'Loss/localization_loss': 13.829172,
 'Loss/regularization_loss': 4274.1826,
 'Loss/total_loss': 16276.976,
 'learning_rate': 0.788948}
INFO:tensorflow:Step 5700 per-step time 1.250s
I0112 17:40:43.837655 140015243441984 model_lib_v2.py:705] Step 5700 per-step time 1.250s
INFO:tensorflow:{'Loss/classification_loss': 6806.923,
 'Loss/localization_loss': 14.4461155,
 'Loss/regularization_loss': 17334.375,
 'Loss/total_loss': 24155.742,
 'learning_rate': 0.78832847}
I0112 17:40:43.837828 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 6806.923,
 'Loss/localization_loss': 14.4461155,
 'Loss/regularization_loss': 17334.375,
 'Loss/total_loss': 24155.742,
 'learning_rate': 0.78832847}
INFO:tensorflow:Step 5800 per-step time 1.252s
I0112 17:42:49.074056 140015243441984 model_lib_v2.py:705] Step 5800 per-step time 1.252s
INFO:tensorflow:{'Loss/classification_loss': 17357.617,
 'Loss/localization_loss': 24.213049,
 'Loss/regularization_loss': 207105.69,
 'Loss/total_loss': 224487.53,
 'learning_rate': 0.78769237}
I0112 17:42:49.074233 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 17357.617,
 'Loss/localization_loss': 24.213049,
 'Loss/regularization_loss': 207105.69,
 'Loss/total_loss': 224487.53,
 'learning_rate': 0.78769237}
INFO:tensorflow:Step 5900 per-step time 1.253s
I0112 17:44:54.405689 140015243441984 model_lib_v2.py:705] Step 5900 per-step time 1.253s
INFO:tensorflow:{'Loss/classification_loss': 514.1073,
 'Loss/localization_loss': 17.313797,
 'Loss/regularization_loss': 194318.05,
 'Loss/total_loss': 194849.47,
 'learning_rate': 0.7870397}
I0112 17:44:54.405875 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 514.1073,
 'Loss/localization_loss': 17.313797,
 'Loss/regularization_loss': 194318.05,
 'Loss/total_loss': 194849.47,
 'learning_rate': 0.7870397}
^C

any help would be much appreciated.

Hi @Leelaram_Jayaram, This is due to exploding gradient you can try increasing the amount of regularization, decreasing the learning rate, or increasing the batch size also helps. Thank You.

IMHO, it seems you should fix TF’s error messages before getting to your loss increasing problem.

1 Like