ObjectDetector's from_csv only partially loads dataset

jimjamslam · July 5, 2022, 8:00am

I’m trying to train an object detection model with tflite-model-maker. I’ve built a training dataset of approximately 2000 samples with 4 different classes across 293 images, which I’ve specified using a CSV file. I’ve manually split the data into training, validation and test.

When I use from_csv() to import the datasets, it only seems to import part of it: 293 training samples are always imported, and around 140-160 validation and test samples are imported (unless I make the validation and test splits so low that there aren’t that many).

This clearly isn’t the full dataset: with an 80/10/10 split, I’m expecting around 1600 training samples to be imported, and around 200 each of the validation and test samples.

I thought this might be a problem with my CSV, so I tried shuffling the rows of my CSV. But I always get 293 training samples and 140-160 of the other two datasets. This suggests to me that there’s no problem parsing the CSV. The fact that there are 293 image sin the dataset and 293 training samples also suggests that the images are all read successfully, but it seems like from_csv is giving up after importing one or two objects from each image.

I’ve also verified that:

for all samples, xmin < xmax and ymin < ymax (ie. the right vertices are specified);
all coordinates are relative in the range of 0 to 1;
no samples are missing a bounding box, path or label.

The function doesn’t throw any sort of error or emit any sort of warning; I only know there’s a problem by checking training_data.size after the import. I’m not sure how to debug this. Is there a way to manually check which samples were successfully imported, or to work out what’s going on?

jimjamslam · July 7, 2022, 8:12am

I’ve also tried splitting my data so that all the objects in a single image belong to a single dataset (TRAIN, TEST or VALIDATION) - in other words, the images, rather than the objects are allocated to splits. I’m still only getting as many objects in the dataset as there are images (293), though:

Training set size: 216
Validation set size: 27
Test set size: 50

jimjamslam · August 13, 2022, 6:28am

Has anyone else had a problem like this? I don’t know what to do anymore!