Hello together,

i currently work on training a object detection model using a ssd mobilenet v2 configuration in tensorflow 2.5.

This in general works ok with the training finishing around ~0.1 loss. Loading the model results in good detections with which i can work so far.

Problem is:

My current test cases all run on single images. In the final application this model is supposed to do the object detection live on incoming camera images. So the plan was to use a tpu and convert the trained model to tflite for the coral edgetpu compiler to be able to handle it.

For this to work the tflite model needs to be 8 bit integer quantized but the model becomes unusable after beeing converted to tflite with 8 bit quantization. If my understanding of my model configuration is correct i even used quantization aware training to reduce the model accuracy loss but it still just results in garbage detections.

I use the following configuration for an ssd mobilenet v2:

```
model {
ssd {
inplace_batchnorm_update: true
freeze_batchnorm: false
num_classes: 3
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
}
}
similarity_calculator {
iou_similarity {
}
}
encode_background_as_zeros: true
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
}
}
image_resizer {
fixed_shape_resizer {
height: 514
width: 614
}
}
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 1
box_code_size: 4
apply_sigmoid_to_scores: false
class_prediction_bias_init: -4.6
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
random_normal_initializer {
stddev: 0.01
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.97,
epsilon: 0.001,
}
}
}
}
feature_extractor {
type: 'ssd_mobilenet_v2_keras'
min_depth: 16
depth_multiplier: 1.0
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.97,
epsilon: 0.001,
}
}
override_base_feature_extractor_hyperparams: true
}
loss {
classification_loss {
weighted_sigmoid_focal {
alpha: 0.75,
gamma: 2.0
}
}
localization_loss {
weighted_smooth_l1 {
delta: 1.0
}
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
normalize_loc_loss_by_codesize: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 5
max_total_detections: 15
}
score_converter: SIGMOID
}
}
}
train_config: {
batch_size: 8
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 8
num_steps: 40000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
random_crop_image {
min_object_covered: 0.0
min_aspect_ratio: 0.75
max_aspect_ratio: 3.0
min_area: 0.75
max_area: 1.0
overlap_thresh: 0.0
}
}
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: .13
total_steps: 40000
warmup_learning_rate: .026666
warmup_steps: 1000
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
}
train_input_reader: {
label_map_path: "...\\label_map.pbtxt"
tf_record_input_reader {
input_path: "...\\training.tfrecord"
}
}
eval_config: {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
}
eval_input_reader: {
label_map_path: "...\\label_map.pbtxt"
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: "...\\eval.tfrecord"
}
}
graph_rewriter {
quantization {
delay: 1000
weight_bits: 8
activation_bits: 8
}
}
```

I’d like to show some images in here but the forum wont let me. Will try to figure out a way to post some example images in here…

As a description of what you’d see: The tflite model creates like 5 detections with 50 cofidence and horrible bounding boxes (2-3x times too large and the center of the object is somewhere on the border line) sometimes even with the wrong label. Meanwhile the original model is always 100% spot on with at least 99% confidence and perfectly fitting bounding boxes (max 2-3 pixels off)

I am fully aware that a quantized model will never bring the same accuracy as a float model but the tensorflow documentation led me to believe that the accuracy loss should be somewhat around <3%.

While i do not know how the term “accuracy loss” is defined here i would not think that my results are something that can be described as a <3% accuracy loss.

For completion of my workflow: I convert the model from checkpoints to saved model format using the script `export_tflite_graph_tf2.py`

from the object detection API. After this i use the following python script to load the model, convert it and save it:

```
def representative_dataset_gen():
data = tf_training_helper.load_images_in_folder_to_numpy_array(IMAGE_BASE_PATH)
(count, x,y,c) = data.shape
for i in range(count):
yield [data[i,:,:,:].reshape(1,x,y,c).astype(np.float32)]
input_data = tf_training_helper.load_image_into_numpy_array(IMAGE_PATHS)
converter = tf.lite.TFLiteConverter.from_saved_model(MODEL_PATH)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
converter.target_spec.supported_types = [tf.int8]
tflite_model = converter.convert()
print("model conversion finished. Starting validation... ")
interpreter = tf.lite.Interpreter(model_content=tflite_model)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
interpreter.set_tensor(input_details[0]['index'], input_data.astype(np.uint8))
interpreter.invoke()
out_boxes = interpreter.get_tensor(output_details[0]['index'])
out_classes = interpreter.get_tensor(output_details[1]['index'])
out_scores = interpreter.get_tensor(output_details[2]['index'])
with tf.io.gfile.GFile(MODEL_SAVE_PATH, 'wb') as f:
f.write(tflite_model)
```

(i redacted some unecessary parts from it to make it a bit shorter. (e.g. path definitions))

If you have any idea on how this loss in accuracy can be prevented or even just why it is there please let me know. Any ideas or suggestions are welcome. (e.g. is ssd mobilenet the correct thing to go for? are other models more robust to conversion?)

Please also let me know if i missed to add any information, i’d be happy to provide it.

Have a great weekend,

Cheers

Georg