Detecting multiple objects using few_shot_od_training example

I am using the few_od_shot_training example to try detect multiple object in a picture (Google Colab).

I am using the default rubber ducks in addition to a image containing 6 rubber ducks. After I have labeled all the ducks, converted it into a TFRecord and then loaded the TFRecord file into the application I try to train the model.
However, if I don’t add the image with 6 ducks and only stick to 1 duck per image the training works. But if I add the image with 6 ducks I get the following error:

./custom_training.py:225 train_step_fn  *
    losses_dict = model.loss(prediction_dict, shapes)
/home/hoster/.local/lib/python3.8/site-packages/object_detection/meta_architectures/ssd_meta_arch.py:842 loss  *
    (batch_cls_targets, batch_cls_weights, batch_reg_targets,
/home/hoster/.local/lib/python3.8/site-packages/object_detection/meta_architectures/ssd_meta_arch.py:1044 _assign_targets  *
    groundtruth_boxlists = [
/home/hoster/.local/lib/python3.8/site-packages/object_detection/core/box_list.py:56 __init__  **
    raise ValueError('Invalid dimensions for box data: {}'.format(

ValueError: Invalid dimensions for box data: (1, 6, 4)

Here is the labels and bounding box lists:

Labels: [<tf.Tensor: shape=(1, 6), dtype=float32, numpy=array([[1., 0., 0., 0., 0., 0.]], dtype=float32)>, <tf.Tensor: shape=(1, 6), dtype=float32, numpy=array([[1., 0., 0., 0., 0., 0.]], dtype=float32)>, <tf.Tensor: shape=(1, 6), dtype=float32, numpy=array([[1., 1., 1., 1., 1., 1.]], dtype=float32)>]

Bounding boxes:` [<tf.Tensor 'groundtruth_boxes_list:0' shape=(1, 6, 4) dtype=float32>, <tf.Tensor 'groundtruth_boxes_list_1:0' shape=(1, 6, 4) dtype=float32>, <tf.Tensor 'groundtruth_boxes_list_2:0' shape=(1, 6, 4) dtype=float32>]`

How can I make the model accept multiple labels in a image?
I am using the EfficientDet-D2 model.

Thanks for any help!

Hi @TensorOverflow,

I suggest to try below points and check if it works.

  • Modify the TFRecord generation script: Update the script to generate TFRecords with proper format for multiple objects. Ensure each image has a corresponding entry in the TFRecord file, including the number of objects present, their labels, and bounding boxes for each object.

Use a different model:

Consider using models explicitly built for detecting and classifying multiple objects in an image. Ex: Faster R-CNN, Mask R-CNN, YOLOv5, and SSD with multi-box loss.

Please let me know it if works for you.

Thanks