EfficientDet unexpected loss function behavior related to objects classes used

Hello everybody!

Can anybody tell me what is the problem with training of my Object Detection EfficientDet0-EfficientDet4 model?

I am trying to train my model to detect some parts (details) as shown below. When i use broad (general) class “TYPE A” i can’t finish training with good results. Also i have other classes “TYPE B”, “TYPE C” and so on.

I get “det_loss: 0.3751” and “val_det_loss: 0.6668”:

Epoch 50/50
20/20 [==============================] - 12s 626ms/step - det_loss: 0.3751 - cls_loss: 0.2400 - box_loss: 0.0027 - reg_l2_loss: 0.0645 - loss: 0.4396 - learning_rate: 3.4627e-06 - gradient_norm: 2.8396 - val_det_loss: 0.6668 - val_cls_loss: 0.4539 - val_box_loss: 0.0043 - val_reg_l2_loss: 0.0645 - val_loss: 0.7313
Used memory:

‘AP’: 1.7964107e-05

{‘AP’: 1.7964107e-05,
‘AP50’: 0.00012474843,
‘AP75’: 0.0, ‘APs’: -1.0,
‘APm’: 0.0,
‘APl’: 0.0002174287,
‘ARmax1’: 0.0,
‘ARmax10’: 0.0,
‘ARmax100’: 0.011111111,
‘ARs’: -1.0, ‘ARm’: 0.0,
‘ARl’: 0.011111111,
‘AP_/Beam_frame’: 5.389232e-05,
‘AP_/Beam_I_frame’: 0.0,
‘AP_/Beam_bent_90_degrees’: 0.0}

But when i break my broad class “TYPE A” into specific classes like “TYPE A1”, “TYPE A2” and so on. Then my training finishes with decent results!

What is the reason behind this behaviour? Not enough data? EfficientDet poor classicifation performance?


I see the problem here is poor annotation which is very vague to be learnt by model. We can expect better results when our annotations and labels are clear with good amount of data.

Thank you!

It’s probably worth giving us an example of what your test data looks like that you’re running the model over.

Are the beams always grouped together like they are in your first image in your test images?

Like specifically what are you trying to detect? And what is the real world practical use?

If you’re looking to detect each one of those individual objects from the image in a range of different scenarios you’re going to need to train classes for all your sub objects.

If the actual object you’re trying to detect is all four objects next to each other as in the first image, you’re going to need to provide a much larger dataset. There is definitely scope for augmentation here to help do this, but it will greatly depend upon the overall purpose of detection.