How to handle null images (no class) using CSV format for Model Maker

Fredrik_T · January 22, 2022, 11:25pm

Loading via object_detector.DataLoader.from_csv(filename) using the CSV format.

I just cant figure out how to provide no-class (null) examples via CSV.
Pascal VOC seems to support that but I put a lot of effort into a Notebook and realized my null images are lost along the way.

Create a fake class with no bounding box? Doesn’t feel correct.

Fredrik_T · January 23, 2022, 10:16am

Trying some alternatives.

Modified one row of my csv-dataset to have an empty class label and set all bounding box vertices columns to string to allow passing an empty string.

  # TODO: Remove this since it's only here to mockup a row where trying to insert a null-class example
  #       This might fail since documentation suggest this is not possible
  mycsv['xmin'] = mycsv['xmin'].astype(str)
  mycsv['ymin'] = mycsv['ymin'].astype(str)
  mycsv['xmax'] = mycsv['xmax'].astype(str)
  mycsv['ymax'] = mycsv['ymax'].astype(str)
  mycsv.at[0,'class'] = '' 
  mycsv.at[0,'xmin'] = ''
  mycsv.at[0,'ymin'] = ''
  mycsv.at[0,'xmax'] = ''
  mycsv.at[0,'ymax'] = ''

Fails during load:

/usr/local/lib/python3.7/dist-packages/tensorflow_examples/lite/model_maker/core/data_util/object_detector_dataloader_util.py in _get_xml_dict_from_csv_lines(images_dir, image_filename, lines)
    335   for line in lines:
    336     label = line[2].strip()
--> 337     xmin, ymin = float(line[3]) * width, float(line[4]) * height
    338     xmax, ymax = float(line[7]) * width, float(line[8]) * height
    339     obj = {

ValueError: could not convert string to float:

Leaving bbox columns as float and inserting None yeilds the same result

  mycsv.at[0,'class'] = '' 
  mycsv.at[0,'xmin'] = None
  mycsv.at[0,'ymin'] = None
  mycsv.at[0,'xmax'] = None
  mycsv.at[0,'ymax'] = None

Columns xmin, xmax, ymin and ymax becomes strings:

split	filename	class	xmin	ymin	xmax	ymax
TEST	file1.jpg
TEST	file2.jpg	red	0.27	0.0	1.0	1.0
TEST	file3.jpg	blue	0.37	0.7866666666666666	1.0	1.0
TEST	file4.jpg	blue	0.426	0.18	0.67	0.59

Fails during load again:

/usr/local/lib/python3.7/dist-packages/tensorflow_examples/lite/model_maker/core/data_util/object_detector_dataloader_util.py in _get_xml_dict_from_csv_lines(images_dir, image_filename, lines)
    335   for line in lines:
    336     label = line[2].strip()
--> 337     xmin, ymin = float(line[3]) * width, float(line[4]) * height
    338     xmax, ymax = float(line[7]) * width, float(line[8]) * height
    339     obj = {

ValueError: could not convert string to float:

Fredrik_T · January 23, 2022, 7:09pm

I only have two classes. So maybe place all other objects in an “other” class?

lgusm · January 26, 2022, 3:51pm

Sorry for my dumb question but why do you want these no-class in your training data?

Bhack · January 26, 2022, 4:56pm

Please check:
https://tensorflow-prod.ospodiscourse.com/t/how-can-i-specialize-my-model-for-precision-and-make-it-default-to-an-output-if-it-is-not-certain-enough-to-sacrifice-accuracy/4859/5?u=bhack

Generalized Out-of-Distribution Detection: A Survey:

Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of machine learning systems. For instance, in autonomous driving, we would like the driving system to issue an alert and hand over the control to humans when it detects unusual scenes or objects that it has never seen before and cannot make a safe decision. This problem first emerged in 2017 and since then has received increasing attention from the research community, leading to a plethora of methods developed, ranging from classification-based to density-based to distance-based ones. Meanwhile, several other problems are closely related to OOD detection in terms of motivation and methodology. These include anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD). Despite having different definitions and problem settings, these problems often confuse readers and practitioners, and as a result, some existing studies misuse terms. In this survey, we first present a generic framework called generalized OOD detection, which encompasses the five aforementioned problems, i.e., AD, ND, OSR, OOD detection, and OD. Under our framework, these five problems can be seen as special cases or sub-tasks, and are easier to distinguish. Then, we conduct a thorough review of each of the five areas by summarizing their recent technical developments. We conclude this survey with open challenges and potential research directions.

Fredrik_T · January 30, 2022, 7:55pm

I wrote a simple mobile app so that I can move around and do inferencing on the go. I’ve realized my model have a tendency to simply pick up on the color and the basic shape (blue-round and red-round objects) so my thinking was to find miss-classified examples and place them all in an other-class. My two objects have features besides the color that my model is failing to pick up on. Dataset has reached 10.000 with about 50/50 for my two objects.

The OOD keyword was really valuable!
Found an article where they mention “…we propose an effective algorithm called ALOE, which performs robust training by exposing the model to both adversarially crafted inlier and outlier examples.”

So instead of generating these examples I will go around and hunt for them with my cam. I have added a label button so that I can upload them straight to my dataset. Will give that a try atleast.

Fredrik_T · January 30, 2022, 8:02pm

The other question was really about null-cases examples. Where there simply isn’t a labeled object present.

Leno · November 1, 2022, 3:55am

Any updates on this? This is a very important question that would be useful for my work as well.

joaozanlorensi · November 21, 2022, 3:47am

I am having the same problem. I also think it is very important and it should work. BTW, even the documentation gives an example of an unlabeled TEST sample.