Need help with Traffic Sign Detection training

Hi Folks.
I have some queries specific to the TF Object Detection API.
Basically, I am trying to fine-tune a SSDMobilenetV2 from the TF OD API model zoo to detect traffic signs, so first I fine-tuned on GTSDB (around 500 samples) and the resulting model wasnt very good.

Then, I tried to train on an augmented version(using rotation, translation, shearing etc.) of GTSDB (4500 samples). During training this time after about 9k iterations, the validation loss starts increasing and never comes back down.

I am assuming that implies overfitting, which according to me could be due to:

  1. train and eval data being very different – i checked and this isnt the case
  2. learning rate too high – i reduced it by a factor of 10 and still overfitting occurs
  3. model might be too complex – the same model was getting properly trained on un-augmented GTSDB with only 500 train samples, so this model shouldnt be too complex for the augmented GTSDB which has around 4500 samples
  4. Augmented dataset might not have been properly created – I converted all the annotated images into a video and checked that manually, the dataset seems fine

I am trying to think of other reasons and would appreciate any help in that regard.
Note: I used imgaug library for data augmentation.
For reference, I have attached my loss curves and config file




Config file:

model {
  ssd {
    inplace_batchnorm_update: true
    freeze_batchnorm: false
    num_classes: 9
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    encode_background_as_zeros: true
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.8
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        class_prediction_bias_init: -4.6
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            random_normal_initializer {
              stddev: 0.01
              mean: 0.0
            }
          }
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.97,
            epsilon: 0.001,
          }
        }
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v2_keras'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.97,
          epsilon: 0.001,
        }
      }
      override_base_feature_extractor_hyperparams: true
    }
    loss {
      classification_loss {
        weighted_sigmoid_focal {
          alpha: 0.75,
          gamma: 2.0
        }
      }
      localization_loss {
        weighted_smooth_l1 {
          delta: 1.0
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    normalize_loc_loss_by_codesize: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  fine_tune_checkpoint_version: V2
  fine_tune_checkpoint: "./sprint1_ssd_mobilenetv2_try2/pretrained_model/mobilnetv2/checkpoint/ckpt-0"
  fine_tune_checkpoint_type: "detection"
  batch_size: 24
  sync_replicas: true
  startup_delay_steps: 0
  replicas_to_aggregate: 8
  num_steps: 50000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: .0008
          total_steps: 50000
          warmup_learning_rate: 0.00013333
          warmup_steps: 1000
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  max_number_of_boxes: 100
  unpad_groundtruth_tensors: false
}

train_input_reader: {
  label_map_path: "./sprint1_ssd_mobilenetv2_try2/label_map.pbtxt"
  tf_record_input_reader {
    input_path: "./sprint1_ssd_mobilenetv2_try2/gtsdb_stop_train_9.record"
  }
}

eval_config: {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
  batch_size: 1
}

eval_input_reader: {
  label_map_path: "./sprint1_ssd_mobilenetv2_try2/label_map.pbtxt"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "./sprint1_ssd_mobilenetv2_try2/gtsdb_stop_val_9.record"
  }
}

Have you tried to add another validation dataset on the same not augmented training when you train on augmented version?

I augmented the GTSDB in one go, and then split it 90/10 train/test with random shuffling. Is this your question?

Yes can you split e.g. 70/30? Have you shuffled correctly your data?

I tried with 80/20 instead of 90/10 but didnt see much difference in loss curves.
As for shuffling across test and train sets, I checked the number of samples per class for the 9 classes:

  • For test set:
{'0': 96,
 '1': 81,
 '2': 89,
 '3': 96,
 '4': 97,
 '5': 104,
 '6': 102,
 '7': 125,
 '8': 148}
  • For train set:
{'0': 406,
 '1': 319,
 '2': 316,
 '3': 352,
 '4': 443,
 '5': 408,
 '6': 516,
 '7': 454,
 '8': 512}

Seems appropriately shuffled to me, I am not sure if there are other things I should analyse.
P.S: The annotations count per class for the original un-augmented dataset:

4
79
81
30
68
53
41
57
32

Can you test the performance on the not augmented training set?

I tested on around 50 samples on the original un-augmented data, and the model is able to detect most of signs properly (usually with confidence of 90% or above as compared to my earlier models trained on original dataset which almost never had a confidence above 50%) at least the ones clearly visible to the naked eye.

Have you tried to reduce your augmentation variance?

E.g. You could start to create a first train/val dataset reducing the augmentation hyperparamters rangers.

If It work well you could try to extend the range a bit and so on.

Thats what I have been trying today all day (removed certain augmentations like cropping and reduced others like rotation angle). During the current training, the model accuracy on training set seems to be increasing fast but not as fast as when I first trained it, so I guess doing aforementioned has slowed down or delayed overfitting, but I fear it’s still going to occur.

I’ll still wait for the current training to complete before testing the model.

Thanks.

You really need to check that you have uniform sampled between the train and eval to cover the same augmentation hyperparams range.

I think also that a 500 sample dataset it could be a little bit small also with augmentation.