How can ı get better metrics?

Hello, I’m kinda new with tensorflow. I’m using colab and trying to apply tfm garden tutuorial to another dataset. My metrics are too low. I’m using resnet with configuration mentioned below. Other details are nearly same with model garden tutorial. Am I using wrong configuration for transfer larning? Thanks for your help!

Dataset: New pothole detection Object Detection Dataset (v2, 2023-01-20 2:29pm) by Smartathon

Resnet Config:
{ ‘runtime’: { ‘all_reduce_alg’: None,
‘batchnorm_spatial_persistent’: False,
‘dataset_num_private_threads’: None,
‘default_shard_dim’: -1,
‘distribution_strategy’: ‘mirrored’,
‘enable_xla’: False,
‘gpu_thread_mode’: None,
‘loss_scale’: None,
‘mixed_precision_dtype’: ‘bfloat16’,
‘num_cores_per_replica’: 1,
‘num_gpus’: 1,
‘num_packs’: 1,
‘per_gpu_thread_count’: 0,
‘run_eagerly’: False,
‘task_index’: -1,
‘tpu’: None,
‘tpu_enable_xla_dynamic_padder’: None,
‘use_tpu_mp_strategy’: False,
‘worker_hosts’: None},
‘task’: { ‘allow_image_summary’: False,
‘annotation_file’: ‘’,
‘differential_privacy_config’: None,
‘export_config’: { ‘cast_detection_classes_to_float’: False,
‘cast_num_detections_to_float’: False,
‘output_intermediate_features’: False,
‘output_normalized_coordinates’: False},
‘freeze_backbone’: False,
‘init_checkpoint’: ‘gs://cloud-tpu-checkpoints/vision-2.0/resnet50_imagenet/ckpt-28080’,
‘init_checkpoint_modules’: ‘backbone’,
‘losses’: { ‘box_loss_weight’: 50,
‘focal_loss_alpha’: 0.25,
‘focal_loss_gamma’: 1.5,
‘huber_loss_delta’: 0.1,
‘l2_weight_decay’: 0.0001,
‘loss_weight’: 1.0},
‘max_num_eval_detections’: 100,
‘model’: { ‘anchor’: { ‘anchor_size’: 4.0,
‘aspect_ratios’: [0.5, 1.0, 2.0],
‘num_scales’: 3},
‘backbone’: { ‘resnet’: { ‘bn_trainable’: True,
‘depth_multiplier’: 1.0,
‘model_id’: 50,
‘replace_stem_max_pool’: False,
‘resnetd_shortcut’: False,
‘scale_stem’: True,
‘se_ratio’: 0.0,
‘stem_type’: ‘v0’,
‘stochastic_depth_drop_rate’: 0.0},
‘type’: ‘resnet’},
‘decoder’: { ‘fpn’: { ‘fusion_type’: ‘sum’,
‘num_filters’: 256,
‘use_keras_layer’: False,
‘use_separable_conv’: False},
‘type’: ‘fpn’},
‘detection_generator’: { ‘apply_nms’: True,
‘box_coder_weights’: None,
‘max_num_detections’: 100,
‘nms_iou_threshold’: 0.5,
‘nms_version’: ‘v2’,
‘pre_nms_score_threshold’: 0.05,
‘pre_nms_top_k’: 5000,
‘return_decoded’: None,
‘soft_nms_sigma’: None,
‘tflite_post_processing’: { ‘max_classes_per_detection’: 2,
‘max_detections’: 200,
‘nms_iou_threshold’: 0.5,
‘nms_score_threshold’: 0.1,
‘normalize_anchor_coordinates’: False,
‘omit_nms’: False,
‘use_regular_nms’: False},
‘use_class_agnostic_nms’: False,
‘use_cpu_nms’: False},
‘head’: { ‘attribute_heads’: [],
‘num_convs’: 4,
‘num_filters’: 256,
‘share_classification_heads’: False,
‘share_level_convs’: True,
‘use_separable_conv’: False},
‘input_size’: [512, 512, 3],
‘max_level’: 7,
‘min_level’: 3,
‘norm_activation’: { ‘activation’: ‘relu’,
‘norm_epsilon’: 0.001,
‘norm_momentum’: 0.99,
‘use_sync_bn’: False},
‘num_classes’: 2},
‘name’: None,
‘per_category_metrics’: False,
‘train_data’: { ‘apply_tf_data_service_before_batching’: False,
‘autotune_algorithm’: None,
‘block_length’: 1,
‘cache’: False,
‘cycle_length’: None,
‘decoder’: { ‘simple_decoder’: { ‘attribute_names’: [ ],
‘mask_binarize_threshold’: None,
‘regenerate_source_id’: False},
‘type’: ‘simple_decoder’},
‘deterministic’: None,
‘drop_remainder’: True,
‘dtype’: ‘float32’,
‘enable_shared_tf_data_service_between_parallel_trainers’: False,
‘enable_tf_data_service’: False,
‘file_type’: ‘tfrecord’,
‘global_batch_size’: 16,
‘input_path’: ‘./pothole_coco_tfrecords/train-00000-of-00001.tfrecord’,
‘is_training’: True,
‘parser’: { ‘aug_policy’: None,
‘aug_rand_hflip’: True,
‘aug_scale_max’: 1.0,
‘aug_scale_min’: 1.0,
‘aug_type’: None,
‘match_threshold’: 0.5,
‘max_num_instances’: 100,
‘num_channels’: 3,
‘pad’: True,
‘skip_crowd_during_training’: True,
‘unmatched_threshold’: 0.5},
‘prefetch_buffer_size’: None,
‘seed’: None,
‘sharding’: True,
‘shuffle_buffer_size’: 10000,
‘tf_data_service_address’: None,
‘tf_data_service_job_name’: None,
‘tfds_as_supervised’: False,
‘tfds_data_dir’: ‘’,
‘tfds_name’: ‘’,
‘tfds_skip_decoding_feature’: ‘’,
‘tfds_split’: ‘’,
‘trainer_id’: None,
‘weights’: None},
‘use_coco_metrics’: True,
‘use_wod_metrics’: False,
‘validation_data’: { ‘apply_tf_data_service_before_batching’: False,
‘autotune_algorithm’: None,
‘block_length’: 1,
‘cache’: False,
‘cycle_length’: None,
‘decoder’: { ‘simple_decoder’: { ‘attribute_names’: [ ],
‘mask_binarize_threshold’: None,
‘regenerate_source_id’: False},
‘type’: ‘simple_decoder’},
‘deterministic’: None,
‘drop_remainder’: True,
‘dtype’: ‘float32’,
‘enable_shared_tf_data_service_between_parallel_trainers’: False,
‘enable_tf_data_service’: False,
‘file_type’: ‘tfrecord’,
‘global_batch_size’: 16,
‘input_path’: ‘./pothole_coco_tfrecords/valid-00000-of-00001.tfrecord’,
‘is_training’: False,
‘parser’: { ‘aug_policy’: None,
‘aug_rand_hflip’: False,
‘aug_scale_max’: 1.0,
‘aug_scale_min’: 1.0,
‘aug_type’: None,
‘match_threshold’: 0.5,
‘max_num_instances’: 100,
‘num_channels’: 3,
‘pad’: True,
‘skip_crowd_during_training’: True,
‘unmatched_threshold’: 0.5},
‘prefetch_buffer_size’: None,
‘seed’: None,
‘sharding’: True,
‘shuffle_buffer_size’: 10000,
‘tf_data_service_address’: None,
‘tf_data_service_job_name’: None,
‘tfds_as_supervised’: False,
‘tfds_data_dir’: ‘’,
‘tfds_name’: ‘’,
‘tfds_skip_decoding_feature’: ‘’,
‘tfds_split’: ‘’,
‘trainer_id’: None,
‘weights’: None}},
‘trainer’: { ‘allow_tpu_summary’: False,
‘best_checkpoint_eval_metric’: ‘’,
‘best_checkpoint_export_subdir’: ‘’,
‘best_checkpoint_metric_comp’: ‘higher’,
‘checkpoint_interval’: 100,
‘continuous_eval_timeout’: 3600,
‘eval_tf_function’: True,
‘eval_tf_while_loop’: False,
‘loss_upper_bound’: 1000000.0,
‘max_to_keep’: 5,
‘optimizer_config’: { ‘ema’: None,
‘learning_rate’: { ‘cosine’: { ‘alpha’: 0.0,
‘decay_steps’: 2000,
‘initial_learning_rate’: 0.1,
‘name’: ‘CosineDecay’,
‘offset’: 0},
‘type’: ‘cosine’},
‘optimizer’: { ‘sgd’: { ‘clipnorm’: None,
‘clipvalue’: None,
‘decay’: 0.0,
‘global_clipnorm’: None,
‘momentum’: 0.9,
‘name’: ‘SGD’,
‘nesterov’: False},
‘type’: ‘sgd’},
‘warmup’: { ‘linear’: { ‘name’: ‘linear’,
‘warmup_learning_rate’: 0.05,
‘warmup_steps’: 100},
‘type’: ‘linear’}},
‘preemption_on_demand_checkpoint’: True,
‘recovery_begin_steps’: 0,
‘recovery_max_trials’: 0,
‘steps_per_loop’: 100,
‘summary_interval’: 100,
‘train_steps’: 2000,
‘train_tf_function’: True,
‘train_tf_while_loop’: True,
‘validation_interval’: 100,
‘validation_steps’: 100,
‘validation_summary_subdir’: ‘validation’}}

Metrics I got:
eval | step: 300 | steps/sec: 1.2 | eval time: 82.5 sec | output:
{‘AP’: 0.0005566976,
‘AP50’: 0.0022050787,
‘AP75’: 0.00023667366,
‘APl’: 0.0016834196,
‘APm’: 2.0558962e-06,
‘APs’: 0.0,
‘ARl’: 0.13759145,
‘ARm’: 0.002629931,
‘ARmax1’: 0.0020283975,
‘ARmax10’: 0.017342798,
‘ARmax100’: 0.06305781,
‘ARs’: 0.0,
‘box_loss’: 0.028296838,
‘cls_loss’: 88.34145,
‘model_loss’: 89.75631,
‘steps_per_second’: 1.2119430544669656,
‘total_loss’: 91.205284,
‘validation_loss’: 91.205284}
train | step: 300 | training until step 400…
train | step: 400 | steps/sec: 0.5 | output:
{‘box_loss’: 0.011412046,
‘cls_loss’: 0.9108507,
‘learning_rate’: 0.090450846,
‘model_loss’: 1.481453,
‘total_loss’: 2.9171762,
‘training_loss’: 2.9171762}
saved checkpoint to ./trained_model/ckpt-400.
eval | step: 400 | running 100 steps of evaluation…
creating index…
index created!
creating index…
index created!
Running per image evaluation…
Evaluate annotation type bbox
DONE (t=15.45s).
Accumulating evaluation results…
DONE (t=1.59s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.002
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.008
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.004
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.009
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.035
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.092
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.002
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.202
eval | step: 400 | steps/sec: 1.4 | eval time: 73.0 sec | output:
{‘AP’: 0.0019078274,
‘AP50’: 0.008207539,
‘AP75’: 0.00011732795,
‘APl’: 0.0040353336,
‘APm’: 4.116707e-05,
‘APs’: 0.0,
‘ARl’: 0.20163196,
‘ARm’: 0.0019411397,
‘ARmax1’: 0.008924949,
‘ARmax10’: 0.035496958,
‘ARmax100’: 0.09163286,
‘ARs’: 0.0,
‘box_loss’: 0.011416817,
‘cls_loss’: 1.1476638,
‘model_loss’: 1.7185049,
‘steps_per_second’: 1.3697912413424502,
‘total_loss’: 3.1407993,
‘validation_loss’: 3.1407993}

Your dataset has 6091 images, right ? If so, at step 300, you haven’t completed even an entire epoch. You’d take 381 steps por epoch with batch_size=16. Have you already tried to train the model longer (e. g. dozens of epochs) ?

Hi, I trained 2000 training steps results are not good again. Should I pick bigger ? Like 6k or something.
eval | step: 2000 | running 100 steps of evaluation…
creating index…
index created!
creating index…
index created!
Running per image evaluation…
Evaluate annotation type bbox
DONE (t=14.10s).
Accumulating evaluation results…
DONE (t=2.50s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.002
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.010
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.005
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.012
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.036
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.105
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.005
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.227
eval | step: 2000 | steps/sec: 1.2 | eval time: 82.1 sec | output:
{‘AP’: 0.0020764563,
‘AP50’: 0.009713366,
‘AP75’: 0.00012276521,
‘APl’: 0.0045901835,
‘APm’: 0.00016007648,
‘APs’: 0.0,
‘ARl’: 0.22712436,
‘ARm’: 0.0054477146,
‘ARmax1’: 0.012195741,
‘ARmax10’: 0.036156185,
‘ARmax100’: 0.10453854,
‘ARs’: 0.0,
‘box_loss’: 0.010982565,
‘cls_loss’: 0.87656015,
‘model_loss’: 1.425688,
‘steps_per_second’: 1.2180639781490166,
‘total_loss’: 2.6838431,
‘validation_loss’: 2.6838431}
eval | step: 2000 | running 100 steps of evaluation…
creating index…
index created!
creating index…
index created!
Running per image evaluation…
Evaluate annotation type bbox
DONE (t=14.89s).
Accumulating evaluation results…
DONE (t=1.76s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.002
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.010
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.005
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.012
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.036
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.105
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.005
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.227
eval | step: 2000 | steps/sec: 1.2 | eval time: 80.5 sec | output:
{‘AP’: 0.0020764563,
‘AP50’: 0.009713366,
‘AP75’: 0.00012276521,
‘APl’: 0.0045901835,
‘APm’: 0.00016007648,
‘APs’: 0.0,
‘ARl’: 0.22712436,
‘ARm’: 0.0054477146,
‘ARmax1’: 0.012195741,
‘ARmax10’: 0.036156185,
‘ARmax100’: 0.10453854,
‘ARs’: 0.0,
‘box_loss’: 0.010982565,
‘cls_loss’: 0.87656015,
‘model_loss’: 1.425688,
‘steps_per_second’: 1.2422472500545192,
‘total_loss’: 2.6838431,
‘validation_loss’: 2.6838431}