Total loss is increasing to 10 digits after some steps

Leelaram_Jayaram · January 12, 2023, 9:37am

Hi Team,

I’m trying to develop a license plate detection model using object_detection. My dataset consists of train and eval sets. Total images of around 200. this is the config file that I’m using. I have only one class to detect “licensePlate”.

# SSD with Mobilenet v2
# Trained on COCO17, initialized from Imagenet classification checkpoint
# Train on TPU-8
#
# Achieves 22.2 mAP on COCO17 Val

model {
  ssd {
    inplace_batchnorm_update: true
    freeze_batchnorm: false
    num_classes: 1
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    encode_background_as_zeros: true
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.8
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        class_prediction_bias_init: -4.6
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            random_normal_initializer {
              stddev: 0.01
              mean: 0.0
            }
          }
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.97,
            epsilon: 0.001,
          }
        }
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v2_keras'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.97,
          epsilon: 0.001,
        }
      }
      override_base_feature_extractor_hyperparams: true
    }
    loss {
      classification_loss {
        weighted_sigmoid_focal {
          alpha: 0.75,
          gamma: 2.0
        }
      }
      localization_loss {
        weighted_smooth_l1 {
          delta: 1.0
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    normalize_loc_loss_by_codesize: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 1
        max_total_detections: 1
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  fine_tune_checkpoint_version: V2
  fine_tune_checkpoint: "../../../data/plate_detection/models/checkpoint/ckpt-0"
  fine_tune_checkpoint_type: "detection"
  batch_size: 16
  sync_replicas: true
  startup_delay_steps: 0
  replicas_to_aggregate: 8
  num_steps: 50000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: .8
          total_steps: 50000
          warmup_learning_rate: 0.13333
          warmup_steps: 2000
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  max_number_of_boxes: 1
  unpad_groundtruth_tensors: false
}

train_input_reader: {
  label_map_path: "../../../data/plate_detection/label_map.pbtxt"
  tf_record_input_reader {
    input_path: "../../../data/plate_detection/train.tfrecord"
  }
}

eval_config: {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
}

eval_input_reader: {
  label_map_path: "../../../data/plate_detection/label_map.pbtxt"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "../../../data/plate_detection/eval.tfrecord"
  }
}

when I run this command

!python model_main_tf2.py --pipeline_config_path={PIPELINE_CONFING_FILEPATH} --model_dir={CHECKPOINTS_DIR} --sample_1_of_n_eval_examples=40 --checkpoint_every_n=100 --alsologtostderr

this is the o/p that I’m getting

2023-01-12 14:36:17.174207: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-12 14:36:17.264609: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-01-12 14:36:17.796095: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-01-12 14:36:17.796134: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-01-12 14:36:17.796139: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-01-12 14:36:25.132688: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_SYSTEM_DRIVER_MISMATCH: system has unsupported display driver / cuda driver combination
2023-01-12 14:36:25.132759: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: LeeAarthi
2023-01-12 14:36:25.132780: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: LeeAarthi
2023-01-12 14:36:25.133038: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 510.108.3
2023-01-12 14:36:25.133080: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 515.86.1
2023-01-12 14:36:25.133097: E tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:313] kernel version 515.86.1 does not match DSO version 510.108.3 -- cannot find working devices in this configuration
2023-01-12 14:36:25.149223: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING:tensorflow:There are non-GPU devices in `tf.distribute.Strategy`, not using nccl allreduce.
W0112 14:36:25.200132 139656852170560 cross_device_ops.py:1387] There are non-GPU devices in `tf.distribute.Strategy`, not using nccl allreduce.
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)
I0112 14:36:25.866486 139656852170560 mirrored_strategy.py:374] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)
INFO:tensorflow:Maybe overwriting train_steps: None
I0112 14:36:25.873561 139656852170560 config_util.py:552] Maybe overwriting train_steps: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0112 14:36:25.873781 139656852170560 config_util.py:552] Maybe overwriting use_bfloat16: False
WARNING:tensorflow:From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/model_lib_v2.py:563: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
W0112 14:36:25.923964 139656852170560 deprecation.py:350] From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/model_lib_v2.py:563: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
INFO:tensorflow:Reading unweighted datasets: ['../../../data/plate_detection/train.tfrecord']
I0112 14:36:26.005203 139656852170560 dataset_builder.py:162] Reading unweighted datasets: ['../../../data/plate_detection/train.tfrecord']
INFO:tensorflow:Reading record datasets for input file: ['../../../data/plate_detection/train.tfrecord']
I0112 14:36:26.005551 139656852170560 dataset_builder.py:79] Reading record datasets for input file: ['../../../data/plate_detection/train.tfrecord']
INFO:tensorflow:Number of filenames to read: 1
I0112 14:36:26.005683 139656852170560 dataset_builder.py:80] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0112 14:36:26.005786 139656852170560 dataset_builder.py:86] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.deterministic`.
W0112 14:36:26.055186 139656852170560 deprecation.py:350] From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.deterministic`.
WARNING:tensorflow:From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/builders/dataset_builder.py:235: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0112 14:36:26.207533 139656852170560 deprecation.py:350] From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/builders/dataset_builder.py:235: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.
Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089
W0112 14:36:26.561252 139656852170560 deprecation.py:350] From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.
Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089
WARNING:tensorflow:From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0112 14:36:30.074060 139656852170560 deprecation.py:350] From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
W0112 14:36:31.960334 139656852170560 deprecation.py:350] From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
WARNING:tensorflow:From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0112 14:36:33.624126 139656852170560 deprecation.py:350] From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
2023-01-12 14:36:37.406115: W tensorflow/core/framework/dataset.cc:769] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
/home/lee/.local/lib/python3.10/site-packages/keras/backend.py:451: UserWarning: `tf.keras.backend.set_learning_phase` is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
  warnings.warn(
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 14:36:42.034106 139653025863232 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 14:36:42.034290 139653025863232 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 14:36:42.034357 139653025863232 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 14:36:42.034413 139653025863232 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 14:36:42.034463 139653025863232 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 14:36:42.034514 139653025863232 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
WARNING:tensorflow:From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/deprecation.py:629: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead
W0112 14:36:55.869040 139652774213184 deprecation.py:554] From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/deprecation.py:629: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead
INFO:tensorflow:Step 100 per-step time 1.134s
I0112 14:38:49.012942 139656852170560 model_lib_v2.py:705] Step 100 per-step time 1.134s
INFO:tensorflow:{'Loss/classification_loss': 1.5236183,
 'Loss/localization_loss': 0.6281918,
 'Loss/regularization_loss': 1.5228351,
 'Loss/total_loss': 3.6746454,
 'learning_rate': 0.1666635}
I0112 14:38:49.013182 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 1.5236183,
 'Loss/localization_loss': 0.6281918,
 'Loss/regularization_loss': 1.5228351,
 'Loss/total_loss': 3.6746454,
 'learning_rate': 0.1666635}
INFO:tensorflow:Step 200 per-step time 0.917s
I0112 14:40:20.684302 139656852170560 model_lib_v2.py:705] Step 200 per-step time 0.917s
INFO:tensorflow:{'Loss/classification_loss': 1.008276,
 'Loss/localization_loss': 0.6590053,
 'Loss/regularization_loss': 1.5134761,
 'Loss/total_loss': 3.1807575,
 'learning_rate': 0.19999701}
I0112 14:40:20.684490 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 1.008276,
 'Loss/localization_loss': 0.6590053,
 'Loss/regularization_loss': 1.5134761,
 'Loss/total_loss': 3.1807575,
 'learning_rate': 0.19999701}
INFO:tensorflow:Step 300 per-step time 0.907s
I0112 14:41:51.405723 139656852170560 model_lib_v2.py:705] Step 300 per-step time 0.907s
INFO:tensorflow:{'Loss/classification_loss': 0.9540243,
 'Loss/localization_loss': 0.8763988,
 'Loss/regularization_loss': 1.4900168,
 'Loss/total_loss': 3.3204398,
 'learning_rate': 0.23333052}
I0112 14:41:51.405892 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 0.9540243,
 'Loss/localization_loss': 0.8763988,
 'Loss/regularization_loss': 1.4900168,
 'Loss/total_loss': 3.3204398,
 'learning_rate': 0.23333052}
INFO:tensorflow:Step 400 per-step time 0.911s
I0112 14:43:22.549281 139656852170560 model_lib_v2.py:705] Step 400 per-step time 0.911s
INFO:tensorflow:{'Loss/classification_loss': 0.5288713,
 'Loss/localization_loss': 0.4404268,
 'Loss/regularization_loss': 1.4639237,
 'Loss/total_loss': 2.4332218,
 'learning_rate': 0.26666403}
I0112 14:43:22.549448 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 0.5288713,
 'Loss/localization_loss': 0.4404268,
 'Loss/regularization_loss': 1.4639237,
 'Loss/total_loss': 2.4332218,
 'learning_rate': 0.26666403}
INFO:tensorflow:Step 500 per-step time 0.905s
I0112 14:44:53.045037 139656852170560 model_lib_v2.py:705] Step 500 per-step time 0.905s
INFO:tensorflow:{'Loss/classification_loss': 0.5595218,
 'Loss/localization_loss': 0.4022845,
 'Loss/regularization_loss': 1.4353888,
 'Loss/total_loss': 2.397195,
 'learning_rate': 0.2999975}
I0112 14:44:53.045216 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 0.5595218,
 'Loss/localization_loss': 0.4022845,
 'Loss/regularization_loss': 1.4353888,
 'Loss/total_loss': 2.397195,
 'learning_rate': 0.2999975}
INFO:tensorflow:Step 600 per-step time 0.909s
I0112 14:46:23.993309 139656852170560 model_lib_v2.py:705] Step 600 per-step time 0.909s
INFO:tensorflow:{'Loss/classification_loss': 0.2747093,
 'Loss/localization_loss': 0.41346416,
 'Loss/regularization_loss': 1.4042052,
 'Loss/total_loss': 2.0923786,
 'learning_rate': 0.33333102}
I0112 14:46:23.993470 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 0.2747093,
 'Loss/localization_loss': 0.41346416,
 'Loss/regularization_loss': 1.4042052,
 'Loss/total_loss': 2.0923786,
 'learning_rate': 0.33333102}
INFO:tensorflow:Step 700 per-step time 0.909s
I0112 14:47:54.858499 139656852170560 model_lib_v2.py:705] Step 700 per-step time 0.909s
INFO:tensorflow:{'Loss/classification_loss': 0.33462483,
 'Loss/localization_loss': 0.35814464,
 'Loss/regularization_loss': 1.371152,
 'Loss/total_loss': 2.0639215,
 'learning_rate': 0.36666453}
I0112 14:47:54.858674 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 0.33462483,
 'Loss/localization_loss': 0.35814464,
 'Loss/regularization_loss': 1.371152,
 'Loss/total_loss': 2.0639215,
 'learning_rate': 0.36666453}
INFO:tensorflow:Step 800 per-step time 0.905s
I0112 14:49:25.337317 139656852170560 model_lib_v2.py:705] Step 800 per-step time 0.905s
INFO:tensorflow:{'Loss/classification_loss': 0.35420462,
 'Loss/localization_loss': 0.55635065,
 'Loss/regularization_loss': 1.3369604,
 'Loss/total_loss': 2.2475157,
 'learning_rate': 0.399998}
I0112 14:49:25.337480 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 0.35420462,
 'Loss/localization_loss': 0.55635065,
 'Loss/regularization_loss': 1.3369604,
 'Loss/total_loss': 2.2475157,
 'learning_rate': 0.399998}
INFO:tensorflow:Step 900 per-step time 0.911s
I0112 14:50:56.446939 139656852170560 model_lib_v2.py:705] Step 900 per-step time 0.911s
INFO:tensorflow:{'Loss/classification_loss': 0.33083853,
 'Loss/localization_loss': 0.3767396,
 'Loss/regularization_loss': 1.2980688,
 'Loss/total_loss': 2.005647,
 'learning_rate': 0.4333315}
I0112 14:50:56.447100 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 0.33083853,
 'Loss/localization_loss': 0.3767396,
 'Loss/regularization_loss': 1.2980688,
 'Loss/total_loss': 2.005647,
 'learning_rate': 0.4333315}
INFO:tensorflow:Step 1000 per-step time 0.909s
I0112 14:52:27.391081 139656852170560 model_lib_v2.py:705] Step 1000 per-step time 0.909s
INFO:tensorflow:{'Loss/classification_loss': 0.23829985,
 'Loss/localization_loss': 0.388591,
 'Loss/regularization_loss': 1.2621,
 'Loss/total_loss': 1.8889909,
 'learning_rate': 0.46666503}
I0112 14:52:27.391244 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 0.23829985,
 'Loss/localization_loss': 0.388591,
 'Loss/regularization_loss': 1.2621,
 'Loss/total_loss': 1.8889909,
 'learning_rate': 0.46666503}
INFO:tensorflow:Step 1100 per-step time 0.907s
I0112 14:53:58.068579 139656852170560 model_lib_v2.py:705] Step 1100 per-step time 0.907s
INFO:tensorflow:{'Loss/classification_loss': 0.35621598,
 'Loss/localization_loss': 0.24143055,
 'Loss/regularization_loss': 1.5909076,
 'Loss/total_loss': 2.188554,
 'learning_rate': 0.4999985}
I0112 14:53:58.068754 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 0.35621598,
 'Loss/localization_loss': 0.24143055,
 'Loss/regularization_loss': 1.5909076,
 'Loss/total_loss': 2.188554,
 'learning_rate': 0.4999985}
INFO:tensorflow:Step 1200 per-step time 0.906s
I0112 14:55:28.655671 139656852170560 model_lib_v2.py:705] Step 1200 per-step time 0.906s
INFO:tensorflow:{'Loss/classification_loss': 0.25968555,
 'Loss/localization_loss': 0.33683094,
 'Loss/regularization_loss': 1.5340892,
 'Loss/total_loss': 2.1306057,
 'learning_rate': 0.53333205}
I0112 14:55:28.655854 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 0.25968555,
 'Loss/localization_loss': 0.33683094,
 'Loss/regularization_loss': 1.5340892,
 'Loss/total_loss': 2.1306057,
 'learning_rate': 0.53333205}
INFO:tensorflow:Step 1300 per-step time 0.900s
I0112 14:56:58.697028 139656852170560 model_lib_v2.py:705] Step 1300 per-step time 0.900s
INFO:tensorflow:{'Loss/classification_loss': 1.4101561,
 'Loss/localization_loss': 0.62371314,
 'Loss/regularization_loss': 1.7546113,
 'Loss/total_loss': 3.7884803,
 'learning_rate': 0.56666553}
I0112 14:56:58.697187 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 1.4101561,
 'Loss/localization_loss': 0.62371314,
 'Loss/regularization_loss': 1.7546113,
 'Loss/total_loss': 3.7884803,
 'learning_rate': 0.56666553}
INFO:tensorflow:Step 1400 per-step time 0.899s
I0112 14:58:28.599185 139656852170560 model_lib_v2.py:705] Step 1400 per-step time 0.899s
INFO:tensorflow:{'Loss/classification_loss': 8677.731,
 'Loss/localization_loss': 49.70236,
 'Loss/regularization_loss': 2980859000.0,
 'Loss/total_loss': 2980867600.0,
 'learning_rate': 0.599999}
I0112 14:58:28.599374 139656852170560 model_lib_v2.py:708] {'Loss/classification_loss': 8677.731,
 'Loss/localization_loss': 49.70236,
 'Loss/regularization_loss': 2980859000.0,
 'Loss/total_loss': 2980867600.0,
 'learning_rate': 0.599999}
^C

I stopped my run after the total loss increased. Can someone please tell where am I making a mistake, I’m doing my final semester project any help would be much appreciated as I’m stuck in this step.

Leelaram_Jayaram · January 12, 2023, 1:58pm

Also, I tried with a different dataset having around 1000 images, where the training and eval set ratio is 7:3, below is the o/p that I’m getting.

2023-01-12 15:40:39.958148: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-12 15:40:40.048710: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-01-12 15:40:40.596023: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-01-12 15:40:40.596071: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-01-12 15:40:40.596077: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-01-12 15:40:41.725431: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_SYSTEM_DRIVER_MISMATCH: system has unsupported display driver / cuda driver combination
2023-01-12 15:40:41.725463: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: LeeAarthi
2023-01-12 15:40:41.725470: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: LeeAarthi
2023-01-12 15:40:41.725614: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 510.108.3
2023-01-12 15:40:41.725629: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 515.86.1
2023-01-12 15:40:41.725635: E tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:313] kernel version 515.86.1 does not match DSO version 510.108.3 -- cannot find working devices in this configuration
2023-01-12 15:40:41.726004: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING:tensorflow:There are non-GPU devices in `tf.distribute.Strategy`, not using nccl allreduce.
W0112 15:40:41.726858 140015243441984 cross_device_ops.py:1387] There are non-GPU devices in `tf.distribute.Strategy`, not using nccl allreduce.
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)
I0112 15:40:41.745393 140015243441984 mirrored_strategy.py:374] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)
INFO:tensorflow:Maybe overwriting train_steps: None
I0112 15:40:41.747863 140015243441984 config_util.py:552] Maybe overwriting train_steps: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0112 15:40:41.747941 140015243441984 config_util.py:552] Maybe overwriting use_bfloat16: False
WARNING:tensorflow:From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/model_lib_v2.py:563: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
W0112 15:40:41.777721 140015243441984 deprecation.py:350] From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/model_lib_v2.py:563: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
INFO:tensorflow:Reading unweighted datasets: ['../../../data/plate_detection/train.tfrecord']
I0112 15:40:41.784984 140015243441984 dataset_builder.py:162] Reading unweighted datasets: ['../../../data/plate_detection/train.tfrecord']
INFO:tensorflow:Reading record datasets for input file: ['../../../data/plate_detection/train.tfrecord']
I0112 15:40:41.785113 140015243441984 dataset_builder.py:79] Reading record datasets for input file: ['../../../data/plate_detection/train.tfrecord']
INFO:tensorflow:Number of filenames to read: 1
I0112 15:40:41.785149 140015243441984 dataset_builder.py:80] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0112 15:40:41.785178 140015243441984 dataset_builder.py:86] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.deterministic`.
W0112 15:40:41.794954 140015243441984 deprecation.py:350] From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.deterministic`.
WARNING:tensorflow:From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/builders/dataset_builder.py:235: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0112 15:40:41.806131 140015243441984 deprecation.py:350] From /home/lee/anaconda3/envs/tfsetup/lib/python3.10/site-packages/object_detection/builders/dataset_builder.py:235: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.
Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089
W0112 15:40:42.132234 140015243441984 deprecation.py:350] From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.
Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089
WARNING:tensorflow:From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0112 15:40:45.451416 140015243441984 deprecation.py:350] From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
W0112 15:40:47.251003 140015243441984 deprecation.py:350] From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
WARNING:tensorflow:From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0112 15:40:48.657864 140015243441984 deprecation.py:350] From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
2023-01-12 15:40:50.360564: W tensorflow/core/framework/dataset.cc:769] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
/home/lee/.local/lib/python3.10/site-packages/keras/backend.py:451: UserWarning: `tf.keras.backend.set_learning_phase` is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
  warnings.warn(
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 15:40:54.365942 140011143939648 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 15:40:54.366105 140011143939648 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 15:40:54.366169 140011143939648 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 15:40:54.366216 140011143939648 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 15:40:54.366261 140011143939648 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0112 15:40:54.366318 140011143939648 convolutional_keras_box_predictor.py:152] depth of additional conv before box predictor: 0
WARNING:tensorflow:From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/deprecation.py:629: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead
W0112 15:41:05.965058 140010347025984 deprecation.py:554] From /home/lee/.local/lib/python3.10/site-packages/tensorflow/python/util/deprecation.py:629: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead
INFO:tensorflow:Step 100 per-step time 1.516s
I0112 15:43:37.426495 140015243441984 model_lib_v2.py:705] Step 100 per-step time 1.516s
INFO:tensorflow:{'Loss/classification_loss': 1.0654833,
 'Loss/localization_loss': 0.5987989,
 'Loss/regularization_loss': 38.64246,
 'Loss/total_loss': 40.306744,
 'learning_rate': 0.1666635}
I0112 15:43:37.426786 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 1.0654833,
 'Loss/localization_loss': 0.5987989,
 'Loss/regularization_loss': 38.64246,
 'Loss/total_loss': 40.306744,
 'learning_rate': 0.1666635}
INFO:tensorflow:Step 200 per-step time 1.265s
I0112 15:45:43.857209 140015243441984 model_lib_v2.py:705] Step 200 per-step time 1.265s
INFO:tensorflow:{'Loss/classification_loss': 1.4991432,
 'Loss/localization_loss': 0.72100943,
 'Loss/regularization_loss': 38.119114,
 'Loss/total_loss': 40.339264,
 'learning_rate': 0.19999701}
I0112 15:45:43.857413 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 1.4991432,
 'Loss/localization_loss': 0.72100943,
 'Loss/regularization_loss': 38.119114,
 'Loss/total_loss': 40.339264,
 'learning_rate': 0.19999701}
INFO:tensorflow:Step 300 per-step time 1.267s
I0112 15:47:50.604677 140015243441984 model_lib_v2.py:705] Step 300 per-step time 1.267s
INFO:tensorflow:{'Loss/classification_loss': 1.0842727,
 'Loss/localization_loss': 0.6753453,
 'Loss/regularization_loss': 37.475296,
 'Loss/total_loss': 39.234917,
 'learning_rate': 0.23333052}
I0112 15:47:50.604861 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 1.0842727,
 'Loss/localization_loss': 0.6753453,
 'Loss/regularization_loss': 37.475296,
 'Loss/total_loss': 39.234917,
 'learning_rate': 0.23333052}
INFO:tensorflow:Step 400 per-step time 1.265s
I0112 15:49:57.122525 140015243441984 model_lib_v2.py:705] Step 400 per-step time 1.265s
INFO:tensorflow:{'Loss/classification_loss': 0.8546873,
 'Loss/localization_loss': 0.55390483,
 'Loss/regularization_loss': 36.74987,
 'Loss/total_loss': 38.158463,
 'learning_rate': 0.26666403}
I0112 15:49:57.122707 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 0.8546873,
 'Loss/localization_loss': 0.55390483,
 'Loss/regularization_loss': 36.74987,
 'Loss/total_loss': 38.158463,
 'learning_rate': 0.26666403}
INFO:tensorflow:Step 500 per-step time 1.266s
I0112 15:52:03.695945 140015243441984 model_lib_v2.py:705] Step 500 per-step time 1.266s
INFO:tensorflow:{'Loss/classification_loss': 0.82560086,
 'Loss/localization_loss': 0.56257683,
 'Loss/regularization_loss': 35.94078,
 'Loss/total_loss': 37.328957,
 'learning_rate': 0.2999975}
I0112 15:52:03.696129 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 0.82560086,
 'Loss/localization_loss': 0.56257683,
 'Loss/regularization_loss': 35.94078,
 'Loss/total_loss': 37.328957,
 'learning_rate': 0.2999975}
INFO:tensorflow:Step 600 per-step time 1.265s
I0112 15:54:10.212975 140015243441984 model_lib_v2.py:705] Step 600 per-step time 1.265s
INFO:tensorflow:{'Loss/classification_loss': 0.90044135,
 'Loss/localization_loss': 0.6134614,
 'Loss/regularization_loss': 35.108566,
 'Loss/total_loss': 36.622467,
 'learning_rate': 0.33333102}
I0112 15:54:10.213164 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 0.90044135,
 'Loss/localization_loss': 0.6134614,
 'Loss/regularization_loss': 35.108566,
 'Loss/total_loss': 36.622467,
 'learning_rate': 0.33333102}
INFO:tensorflow:Step 700 per-step time 1.264s
I0112 15:56:16.611257 140015243441984 model_lib_v2.py:705] Step 700 per-step time 1.264s
INFO:tensorflow:{'Loss/classification_loss': 0.70723003,
 'Loss/localization_loss': 0.60727644,
 'Loss/regularization_loss': 34.154346,
 'Loss/total_loss': 35.468853,
 'learning_rate': 0.36666453}
I0112 15:56:16.611439 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 0.70723003,
 'Loss/localization_loss': 0.60727644,
 'Loss/regularization_loss': 34.154346,
 'Loss/total_loss': 35.468853,
 'learning_rate': 0.36666453}
INFO:tensorflow:Step 800 per-step time 1.265s
I0112 15:58:23.133450 140015243441984 model_lib_v2.py:705] Step 800 per-step time 1.265s
INFO:tensorflow:{'Loss/classification_loss': 1.0843798,
 'Loss/localization_loss': 0.7819879,
 'Loss/regularization_loss': 33.14651,
 'Loss/total_loss': 35.01288,
 'learning_rate': 0.399998}
I0112 15:58:23.133637 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 1.0843798,
 'Loss/localization_loss': 0.7819879,
 'Loss/regularization_loss': 33.14651,
 'Loss/total_loss': 35.01288,
 'learning_rate': 0.399998}
INFO:tensorflow:Step 900 per-step time 1.268s
I0112 16:00:29.977104 140015243441984 model_lib_v2.py:705] Step 900 per-step time 1.268s
INFO:tensorflow:{'Loss/classification_loss': 0.80963266,
 'Loss/localization_loss': 0.6053904,
 'Loss/regularization_loss': 32.171913,
 'Loss/total_loss': 33.586933,
 'learning_rate': 0.4333315}
I0112 16:00:29.977288 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 0.80963266,
 'Loss/localization_loss': 0.6053904,
 'Loss/regularization_loss': 32.171913,
 'Loss/total_loss': 33.586933,
 'learning_rate': 0.4333315}
INFO:tensorflow:Step 1000 per-step time 1.261s
I0112 16:02:36.109382 140015243441984 model_lib_v2.py:705] Step 1000 per-step time 1.261s
INFO:tensorflow:{'Loss/classification_loss': 137.5638,
 'Loss/localization_loss': 3.1618676,
 'Loss/regularization_loss': 1370.6179,
 'Loss/total_loss': 1511.3436,
 'learning_rate': 0.46666503}
I0112 16:02:36.109562 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 137.5638,
 'Loss/localization_loss': 3.1618676,
 'Loss/regularization_loss': 1370.6179,
 'Loss/total_loss': 1511.3436,
 'learning_rate': 0.46666503}

skipped some steps due to space constraint.

INFO:tensorflow:Step 3600 per-step time 1.251s
I0112 16:56:52.705066 140015243441984 model_lib_v2.py:705] Step 3600 per-step time 1.251s
INFO:tensorflow:{'Loss/classification_loss': 8528.943,
 'Loss/localization_loss': 47.016922,
 'Loss/regularization_loss': 14735.664,
 'Loss/total_loss': 23311.625,
 'learning_rate': 0.79780877}
I0112 16:56:52.705242 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 8528.943,
 'Loss/localization_loss': 47.016922,
 'Loss/regularization_loss': 14735.664,
 'Loss/total_loss': 23311.625,
 'learning_rate': 0.79780877}
INFO:tensorflow:Step 3700 per-step time 1.252s
I0112 16:58:57.892563 140015243441984 model_lib_v2.py:705] Step 3700 per-step time 1.252s
INFO:tensorflow:{'Loss/classification_loss': 13402.302,
 'Loss/localization_loss': 30.94923,
 'Loss/regularization_loss': 13804.406,
 'Loss/total_loss': 27237.656,
 'learning_rate': 0.79752654}
I0112 16:58:57.892744 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 13402.302,
 'Loss/localization_loss': 30.94923,
 'Loss/regularization_loss': 13804.406,
 'Loss/total_loss': 27237.656,
 'learning_rate': 0.79752654}
INFO:tensorflow:Step 3800 per-step time 1.248s
I0112 17:01:02.741252 140015243441984 model_lib_v2.py:705] Step 3800 per-step time 1.248s
INFO:tensorflow:{'Loss/classification_loss': 12479.815,
 'Loss/localization_loss': 40.673664,
 'Loss/regularization_loss': 12958.073,
 'Loss/total_loss': 25478.562,
 'learning_rate': 0.7972274}
I0112 17:01:02.741428 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 12479.815,
 'Loss/localization_loss': 40.673664,
 'Loss/regularization_loss': 12958.073,
 'Loss/total_loss': 25478.562,
 'learning_rate': 0.7972274}
INFO:tensorflow:Step 3900 per-step time 1.251s
I0112 17:03:07.888379 140015243441984 model_lib_v2.py:705] Step 3900 per-step time 1.251s
INFO:tensorflow:{'Loss/classification_loss': 4355.9683,
 'Loss/localization_loss': 16.196148,
 'Loss/regularization_loss': 12090.652,
 'Loss/total_loss': 16462.816,
 'learning_rate': 0.7969112}
I0112 17:03:07.888562 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 4355.9683,
 'Loss/localization_loss': 16.196148,
 'Loss/regularization_loss': 12090.652,
 'Loss/total_loss': 16462.816,
 'learning_rate': 0.7969112}
INFO:tensorflow:Step 4000 per-step time 1.252s
I0112 17:05:13.094829 140015243441984 model_lib_v2.py:705] Step 4000 per-step time 1.252s
INFO:tensorflow:{'Loss/classification_loss': 12362.029,
 'Loss/localization_loss': 8.380647,
 'Loss/regularization_loss': 11670.155,
 'Loss/total_loss': 24040.564,
 'learning_rate': 0.79657793}
I0112 17:05:13.095010 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 12362.029,
 'Loss/localization_loss': 8.380647,
 'Loss/regularization_loss': 11670.155,
 'Loss/total_loss': 24040.564,
 'learning_rate': 0.79657793}
INFO:tensorflow:Step 4100 per-step time 1.253s
I0112 17:07:18.401231 140015243441984 model_lib_v2.py:705] Step 4100 per-step time 1.253s
INFO:tensorflow:{'Loss/classification_loss': 7558.982,
 'Loss/localization_loss': 12.8216,
 'Loss/regularization_loss': 10928.576,
 'Loss/total_loss': 18500.38,
 'learning_rate': 0.79622775}
I0112 17:07:18.401425 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 7558.982,
 'Loss/localization_loss': 12.8216,
 'Loss/regularization_loss': 10928.576,
 'Loss/total_loss': 18500.38,
 'learning_rate': 0.79622775}
INFO:tensorflow:Step 4200 per-step time 1.252s
I0112 17:09:23.593532 140015243441984 model_lib_v2.py:705] Step 4200 per-step time 1.252s
INFO:tensorflow:{'Loss/classification_loss': 12317.455,
 'Loss/localization_loss': 36.416595,
 'Loss/regularization_loss': 10190.096,
 'Loss/total_loss': 22543.969,
 'learning_rate': 0.7958606}
I0112 17:09:23.593710 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 12317.455,
 'Loss/localization_loss': 36.416595,
 'Loss/regularization_loss': 10190.096,
 'Loss/total_loss': 22543.969,
 'learning_rate': 0.7958606}
INFO:tensorflow:Step 4300 per-step time 1.253s
I0112 17:11:28.884701 140015243441984 model_lib_v2.py:705] Step 4300 per-step time 1.253s
INFO:tensorflow:{'Loss/classification_loss': 2621.9905,
 'Loss/localization_loss': 11.7453165,
 'Loss/regularization_loss': 9510.851,
 'Loss/total_loss': 12144.586,
 'learning_rate': 0.79547644}
I0112 17:11:28.884880 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 2621.9905,
 'Loss/localization_loss': 11.7453165,
 'Loss/regularization_loss': 9510.851,
 'Loss/total_loss': 12144.586,
 'learning_rate': 0.79547644}
INFO:tensorflow:Step 4400 per-step time 1.252s
I0112 17:13:34.106339 140015243441984 model_lib_v2.py:705] Step 4400 per-step time 1.252s
INFO:tensorflow:{'Loss/classification_loss': 9142.615,
 'Loss/localization_loss': 24.524694,
 'Loss/regularization_loss': 8922.91,
 'Loss/total_loss': 18090.05,
 'learning_rate': 0.79507536}
I0112 17:13:34.106516 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 9142.615,
 'Loss/localization_loss': 24.524694,
 'Loss/regularization_loss': 8922.91,
 'Loss/total_loss': 18090.05,
 'learning_rate': 0.79507536}
INFO:tensorflow:Step 4500 per-step time 1.250s
I0112 17:15:39.126658 140015243441984 model_lib_v2.py:705] Step 4500 per-step time 1.250s
INFO:tensorflow:{'Loss/classification_loss': 13348.365,
 'Loss/localization_loss': 22.179163,
 'Loss/regularization_loss': 8383.519,
 'Loss/total_loss': 21754.062,
 'learning_rate': 0.79465735}
I0112 17:15:39.126846 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 13348.365,
 'Loss/localization_loss': 22.179163,
 'Loss/regularization_loss': 8383.519,
 'Loss/total_loss': 21754.062,
 'learning_rate': 0.79465735}
INFO:tensorflow:Step 4600 per-step time 1.255s
I0112 17:17:44.612639 140015243441984 model_lib_v2.py:705] Step 4600 per-step time 1.255s
INFO:tensorflow:{'Loss/classification_loss': 8698.898,
 'Loss/localization_loss': 9.890597,
 'Loss/regularization_loss': 7879.993,
 'Loss/total_loss': 16588.781,
 'learning_rate': 0.7942225}
I0112 17:17:44.612818 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 8698.898,
 'Loss/localization_loss': 9.890597,
 'Loss/regularization_loss': 7879.993,
 'Loss/total_loss': 16588.781,
 'learning_rate': 0.7942225}
INFO:tensorflow:Step 4700 per-step time 1.250s
I0112 17:19:49.605818 140015243441984 model_lib_v2.py:705] Step 4700 per-step time 1.250s
INFO:tensorflow:{'Loss/classification_loss': 20935.793,
 'Loss/localization_loss': 19.896955,
 'Loss/regularization_loss': 7369.5054,
 'Loss/total_loss': 28325.195,
 'learning_rate': 0.7937706}
I0112 17:19:49.605993 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 20935.793,
 'Loss/localization_loss': 19.896955,
 'Loss/regularization_loss': 7369.5054,
 'Loss/total_loss': 28325.195,
 'learning_rate': 0.7937706}
INFO:tensorflow:Step 4800 per-step time 1.255s
I0112 17:21:55.071570 140015243441984 model_lib_v2.py:705] Step 4800 per-step time 1.255s
INFO:tensorflow:{'Loss/classification_loss': 2717.7703,
 'Loss/localization_loss': 12.026006,
 'Loss/regularization_loss': 6828.674,
 'Loss/total_loss': 9558.47,
 'learning_rate': 0.793302}
I0112 17:21:55.071758 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 2717.7703,
 'Loss/localization_loss': 12.026006,
 'Loss/regularization_loss': 6828.674,
 'Loss/total_loss': 9558.47,
 'learning_rate': 0.793302}
INFO:tensorflow:Step 4900 per-step time 1.251s
I0112 17:24:00.135459 140015243441984 model_lib_v2.py:705] Step 4900 per-step time 1.251s
INFO:tensorflow:{'Loss/classification_loss': 16445.629,
 'Loss/localization_loss': 10.181732,
 'Loss/regularization_loss': 6469.9517,
 'Loss/total_loss': 22925.762,
 'learning_rate': 0.79281646}
I0112 17:24:00.135655 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 16445.629,
 'Loss/localization_loss': 10.181732,
 'Loss/regularization_loss': 6469.9517,
 'Loss/total_loss': 22925.762,
 'learning_rate': 0.79281646}
INFO:tensorflow:Step 5000 per-step time 1.253s
I0112 17:26:05.422236 140015243441984 model_lib_v2.py:705] Step 5000 per-step time 1.253s
INFO:tensorflow:{'Loss/classification_loss': 11325.532,
 'Loss/localization_loss': 39.508034,
 'Loss/regularization_loss': 6047.702,
 'Loss/total_loss': 17412.742,
 'learning_rate': 0.7923141}
I0112 17:26:05.422418 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 11325.532,
 'Loss/localization_loss': 39.508034,
 'Loss/regularization_loss': 6047.702,
 'Loss/total_loss': 17412.742,
 'learning_rate': 0.7923141}
INFO:tensorflow:Step 5100 per-step time 1.251s
I0112 17:28:10.537145 140015243441984 model_lib_v2.py:705] Step 5100 per-step time 1.251s
INFO:tensorflow:{'Loss/classification_loss': 6021.8306,
 'Loss/localization_loss': 9.935094,
 'Loss/regularization_loss': 5676.9565,
 'Loss/total_loss': 11708.723,
 'learning_rate': 0.79179496}
I0112 17:28:10.537329 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 6021.8306,
 'Loss/localization_loss': 9.935094,
 'Loss/regularization_loss': 5676.9565,
 'Loss/total_loss': 11708.723,
 'learning_rate': 0.79179496}
INFO:tensorflow:Step 5200 per-step time 1.254s
I0112 17:30:15.932446 140015243441984 model_lib_v2.py:705] Step 5200 per-step time 1.254s
INFO:tensorflow:{'Loss/classification_loss': 3318.1829,
 'Loss/localization_loss': 29.817122,
 'Loss/regularization_loss': 5382.4917,
 'Loss/total_loss': 8730.491,
 'learning_rate': 0.79125905}
I0112 17:30:15.932627 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 3318.1829,
 'Loss/localization_loss': 29.817122,
 'Loss/regularization_loss': 5382.4917,
 'Loss/total_loss': 8730.491,
 'learning_rate': 0.79125905}
INFO:tensorflow:Step 5300 per-step time 1.251s
I0112 17:32:21.036635 140015243441984 model_lib_v2.py:705] Step 5300 per-step time 1.251s
INFO:tensorflow:{'Loss/classification_loss': 8942.402,
 'Loss/localization_loss': 19.730413,
 'Loss/regularization_loss': 5201.262,
 'Loss/total_loss': 14163.395,
 'learning_rate': 0.79070634}
I0112 17:32:21.036813 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 8942.402,
 'Loss/localization_loss': 19.730413,
 'Loss/regularization_loss': 5201.262,
 'Loss/total_loss': 14163.395,
 'learning_rate': 0.79070634}
INFO:tensorflow:Step 5400 per-step time 1.252s
I0112 17:34:26.207802 140015243441984 model_lib_v2.py:705] Step 5400 per-step time 1.252s
INFO:tensorflow:{'Loss/classification_loss': 900.0243,
 'Loss/localization_loss': 22.880278,
 'Loss/regularization_loss': 4777.2974,
 'Loss/total_loss': 5700.202,
 'learning_rate': 0.79013693}
I0112 17:34:26.208016 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 900.0243,
 'Loss/localization_loss': 22.880278,
 'Loss/regularization_loss': 4777.2974,
 'Loss/total_loss': 5700.202,
 'learning_rate': 0.79013693}
INFO:tensorflow:Step 5500 per-step time 1.272s
I0112 17:36:33.387695 140015243441984 model_lib_v2.py:705] Step 5500 per-step time 1.272s
INFO:tensorflow:{'Loss/classification_loss': 6431.478,
 'Loss/localization_loss': 40.654152,
 'Loss/regularization_loss': 4468.076,
 'Loss/total_loss': 10940.209,
 'learning_rate': 0.7895508}
I0112 17:36:33.387874 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 6431.478,
 'Loss/localization_loss': 40.654152,
 'Loss/regularization_loss': 4468.076,
 'Loss/total_loss': 10940.209,
 'learning_rate': 0.7895508}
INFO:tensorflow:Step 5600 per-step time 1.255s
I0112 17:38:38.869205 140015243441984 model_lib_v2.py:705] Step 5600 per-step time 1.255s
INFO:tensorflow:{'Loss/classification_loss': 11988.964,
 'Loss/localization_loss': 13.829172,
 'Loss/regularization_loss': 4274.1826,
 'Loss/total_loss': 16276.976,
 'learning_rate': 0.788948}
I0112 17:38:38.869393 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 11988.964,
 'Loss/localization_loss': 13.829172,
 'Loss/regularization_loss': 4274.1826,
 'Loss/total_loss': 16276.976,
 'learning_rate': 0.788948}
INFO:tensorflow:Step 5700 per-step time 1.250s
I0112 17:40:43.837655 140015243441984 model_lib_v2.py:705] Step 5700 per-step time 1.250s
INFO:tensorflow:{'Loss/classification_loss': 6806.923,
 'Loss/localization_loss': 14.4461155,
 'Loss/regularization_loss': 17334.375,
 'Loss/total_loss': 24155.742,
 'learning_rate': 0.78832847}
I0112 17:40:43.837828 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 6806.923,
 'Loss/localization_loss': 14.4461155,
 'Loss/regularization_loss': 17334.375,
 'Loss/total_loss': 24155.742,
 'learning_rate': 0.78832847}
INFO:tensorflow:Step 5800 per-step time 1.252s
I0112 17:42:49.074056 140015243441984 model_lib_v2.py:705] Step 5800 per-step time 1.252s
INFO:tensorflow:{'Loss/classification_loss': 17357.617,
 'Loss/localization_loss': 24.213049,
 'Loss/regularization_loss': 207105.69,
 'Loss/total_loss': 224487.53,
 'learning_rate': 0.78769237}
I0112 17:42:49.074233 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 17357.617,
 'Loss/localization_loss': 24.213049,
 'Loss/regularization_loss': 207105.69,
 'Loss/total_loss': 224487.53,
 'learning_rate': 0.78769237}
INFO:tensorflow:Step 5900 per-step time 1.253s
I0112 17:44:54.405689 140015243441984 model_lib_v2.py:705] Step 5900 per-step time 1.253s
INFO:tensorflow:{'Loss/classification_loss': 514.1073,
 'Loss/localization_loss': 17.313797,
 'Loss/regularization_loss': 194318.05,
 'Loss/total_loss': 194849.47,
 'learning_rate': 0.7870397}
I0112 17:44:54.405875 140015243441984 model_lib_v2.py:708] {'Loss/classification_loss': 514.1073,
 'Loss/localization_loss': 17.313797,
 'Loss/regularization_loss': 194318.05,
 'Loss/total_loss': 194849.47,
 'learning_rate': 0.7870397}
^C

any help would be much appreciated.

Kiran_Sai_Ramineni · January 16, 2023, 11:09am

Hi @Leelaram_Jayaram, This is due to exploding gradient you can try increasing the amount of regularization, decreasing the learning rate, or increasing the batch size also helps. Thank You.

mauriciocramos · January 18, 2023, 4:12am

IMHO, it seems you should fix TF’s error messages before getting to your loss increasing problem.