Models not working with Model Maker

I’ve scaled up my data for my Model Maker training Colab process based upon @khanhlvg Colabs Salad Training Set data and Android Figurine datasets. I’ve now run the process from start to finish twice and produced tflite models that don’t appear to be returning any results (specifically in the Android test app as per the colabs).

I’ve inspected with Netron but can’t really make sense of what is going on.

I’ve attached both models to this post I was wondering whether someone could have a look at them and help me diagnose what I’m doing wrong? Mainly the question is, is it a data problem (ie is the data incorrect) or a training process error?

Any help/thoughts would be greatly appreciated.

Not working model 1

Not working model 2

On further inspection with this ie lowering the threshold for detection in the Android app to

.setScoreThreshold(0.01f)

I am now detecting objects, albeit incorrect detections, namely because of such a low threshold score.

This has lead me to believe the models are working but something has gone wrong in the transfer training process.

I think it has to do with the ‘transfer’ process of the training, I’m thinking this is not transferring the existing layers of Efficientdet to the new classes and images I’m trying to train on top of it and rather just training the new images that I’m trying to transfer onto it.

Currently I’m using around ~19,000 images across 19 labels with ~65,000 data points (multiple bounding boxes in images).

Is anybody else experiencing this? I think it has to do with selecting EfficentDet model. Can anyone see any problems with the code below:

# instead of spec = model_spec.get('efficientdet_lite2')

spec = object_detector.EfficientDetSpec(
  model_name='efficientdet-lite2', 
  uri='https://tfhub.dev/tensorflow/efficientdet/lite2/feature-vector/1', 
  hparams={'max_instances_per_image': 8000})

Really drawing up blanks here (should clarify i need to use the hyper parameter for multiple for max instances because of data)

Hi,

Did you merge both datasets (salada and androids)? If so, how did you do it?
when you evaluate the resulting model from model maker, what do you get?

Hi,

I’ve created a new dataset of my own using images from OpenImage v6.

Images are stored in a Google Bucket, and data is imported via CSV following the same format as the salad training tutorial

Salad Training CSV

I’ve just restarted my training model this morning but I’ll take a screenshot of the evaluation data when it is completed in a few hours.

AP was around .001 overall. Does this sound about right for ~68k data parameters fairly evenly spread across 19 classes?

I’ll compare some of my results to the salad training dataset but would it make sense for a bit of training accuracy to improve with the salad data training set using roughly ~1000 data parameters over 5 classes?

Cheers,

Will

Evaluation from EfficentDet-lite2 from ~68k data parameters, approximately ~1k labelled images per class.

38/38 [==============================] - 93s 2s/step

{'AP': 0.0008735819,
'AP50': 0.0021586155,
'AP75': 0.00048909895,
'AP_/ball': 0.0,
'AP_/book': 0.00048107677,
'AP_/bowl': 5.509733e-07,
'AP_/bread': 0.004670803,
'AP_/car': 0.002575293,
'AP_/cattle': 1.5878324e-05,
'AP_/chair': 0.0,
'AP_/dog': 0.00094002986,
'AP_/door': 0.0,
'AP_/flower': 5.3904612e-05,
'AP_/fruit': 0.0,
'AP_/house': 0.0036491964,
'AP_/man': 4.7694466e-05,
'AP_/pen': 0.0,
'AP_/plant': 0.0026239387,
'AP_/tree': 0.0003467007,
'AP_/window': 0.0,
'AP_/woman': 0.00031940785,
'APl': 0.0009520919,
'APm': 7.297001e-06,
'APs': 0.0,
'ARl': 0.03195033,
'ARm': 0.0003267974,
'ARmax1': 0.011142323,
'ARmax10': 0.027187686,
'ARmax100': 0.02821953,
'ARs': 0.0}

Your model’s AP (average precision) is very bad, and it explains why it didn’t work when you put it in the Android app. You should aim for a model with AP above 0.3.

There can be several reasons:

  • Bad training data (e.g incorrect label).
  • Not train with enough epochs. You can look at the loss during the training to see if it’s decreasing.

Rather than training with all the 19 classes at once, you can start with a 2-3 classes to make sure that your training pipeline is working. Then you can add more data later.

Hope this helps!

1 Like

Hello ,
I have the same problem , i am trying to train a custom model to detect one class ( Cup), i am using around 50 images for training , and 5 images for validation.
I followed the colab Model Maker Object Detection for Android Figurine .
However , when i try to evaluate my model , i always get this

1/1 [==============================] - 4s 4s/step

{'AP': 0.0,
 'AP50': 0.0,
 'AP75': 0.0,
 'AP_/Cup': 0.0,
 'APl': 0.0,
 'APm': -1.0,
 'APs': -1.0,
 'ARl': 0.0,
 'ARm': -1.0,
 'ARmax1': 0.0,
 'ARmax10': 0.0,
 'ARmax100': 0.0,
 'ARs': -1.0}

I dont know what’s the problem , is it the way i created the dataset ?
Thank you for the help

Hi Khanh,

Thanks for replying, I did as you recommended and trained with 3 classes and returned the very low AP results.

I’m trying to rule out a problem with my data loader csv at this point.

I’m using OpenImages with Yolo downloader which I’m then reformatting to match the CSV as per the tutorial.

Here is an example of one line of code for one image from the dataset:

TRAIN,gs://data_images_1000/18/00003e2837c7b728.jpg,book,0.6615625,0.7973975,,,0.463125,0.230483,,

To double check the image coordinates I’ve loaded the image into MakeSense.ai and drawn a bounding box around the ‘book’ from the image and exported in Yolo format which is a follows:

0 0.657133 0.800399 0.475428 0.245509

This leads me to believe the coordinates are correct (these numbers are very close) but potentially I’m missing something here or have screwed up with the reformatting.

Can you see any problems with coordinates or that line of CSV data?

I’ve attached a copy of the image with the label box for reference.

Any help is greatly appreciated?

So further research has helped me with figure out the problem (feel pretty stupid at this point, but maybe it’ll help others as well). My coordinate are wrong and are Yolo rather than AutoML coordinates.

Does anyone have a good way to convert from Yolo coordinates to AutoML?

Right now I’m thinking the best way might be to convert from Yolo to Pascal VOC?

what are the difference of the formats? I think it would be great to have that documented somewhere if not yet

Good point @lgusm

Let’s take a look at how labelImg and Google Vision exports formats as they are the two main image labelling dataset formats used in the Model Maker tutorials.

labelImg exports the YOLO, Pascal VOC and CreateML formats, while Google Vision exports VisionML (.csv) format (as used in the Salad Object Detection Tutorial).

Below are some examples of the formats for the image below:

YOLO (labelImg)

0 0.518519 0.724306 0.331481 0.384722

has a labels.txt file associated with it.

Google Vision (Google)

TEST,gs://data_images_1000/automl/house.jpg,house,0.36296296,0.52698416,0.66772485,0.52698416,0.66772485,0.9,0.36296296,0.9

images are annotated and labelled inside Google Vision dashboard.

Pascal VOC (labelImg)

<annotation>
	<folder>Images</folder>
	<filename>house.jpg</filename>
	<path>/Users/username/images/house.jpg</path>
	<source>
		<database>Unknown</database>
	</source>
	<size>
		<width>1080</width>
		<height>720</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
	<object>
		<name>house</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>381</xmin>
			<ymin>383</ymin>
			<xmax>739</xmax>
			<ymax>660</ymax>
		</bndbox>
	</object>
</annotation>

Can be used with Model Maker as used in Object Detection for Android Figurine Tutorial.

CreateML (labelImg)

[{"image": "house.jpg", "annotations": [{"label": "house", "coordinates": {"x": 560.8333333333333, "y": 522.2083333333333, "width": 357.99999999999994, "height": 277.0}}]}]

Some Key Points to Note

From what I can tell Model Maker can use both Google Vision format (.csv) and Pascal VOC (.xml) with its DataLoader.

Here is the code to use the Google Vision format:

train_data, validation_data, test_data = object_detector.DataLoader.from_csv('gs://some_google_bucket/dataset.csv')

Here is the code to use the Pascal VOC format:

train_data = object_detector.DataLoader.from_pascal_voc(
    'android_figurine/train',
    'android_figurine/train',
    ['android', 'pig_android']
)

val_data = object_detector.DataLoader.from_pascal_voc(
    'android_figurine/validate',
    'android_figurine/validate',
    ['android', 'pig_android']
)

It is important to note while YOLO coordinates do look very similar to the Google Vision coordinates (ie both being between 0 and 1) they are different.

A final note, from personal experience I’ve found the Model Maker DataLoader to be much quicker with Pascal VOC format when using larger datasets (~17k images or so). The Google Storage Bucket of images and .xml files are a bit harder to keep track of, but the DataLoader feels much quicker, at least to me.

Also, don’t forget to set the max instances per image hyper parameter to accept more multiple instances if you have images with multiple bounding boxes. I kept getting errors on this and it took me a while to figure out.

spec = object_detector.EfficientDetSpec(
  model_name='efficientdet-lite2', 
  uri='https://tfhub.dev/tensorflow/efficientdet/lite2/feature-vector/1', 
  hparams={'max_instances_per_image': 8000})