Color format and bit depth of images for ssd_mobilenet v1 640X640


I need some help as i´ve been trying to create a working model with my own pictures now for a while and so far there has been a lot of trouble and less progress. The closest i have gotten is that when i have confidence threshold under 0.20 the entire screen is more or less filled with bounding boxes but as soon as i go over there is nothing.

Well; To the question;

I was wondering about the images for training a ssd_mobilenet v1 model in tensorflow. I read on Openvino that their model wanted BGR, is it the same for TF? What “bit depth” (i´m not sure if this is the right word for it) should the images have? (ex.8, 24 or so). For what i understand they should be in jpg-format, right?

I´m trying to trace potential errors and taking it from the top.

I´m trying to train a model that should detect two classes. As the camera will be mounted and the objects moving only along one axis i´m having some trouble as its hard getting pictures that are not almost identical to each other. The environment is also (color vise) almost gray scale of nature so everything sort of blend together. Does anyone know a model for object detection that is good at identifying round shapes in a “gray-on-gray” environment?

Another quick question. Would it be ok to post a more specific question with my settings (command arguments for scripts, general config settings) and the working procedure for training a model here so that (hopefully) someone would read it and find what i´m doing wrong?



1 Like

Hi Martin,

for the color format, it’s usually RGB format (I’ve never read about BGR but that might just be ignorance from my side)

you can add more information for people to help of course.
How are you fine tuning the model:

  • from command line?
  • on Cloud?
  • using Model Maker?

how many images are you using?

Hi Igusm,

Thank you for your reply!

I have read about fine tuning but not really understood when and how to do it so so far, what i know, i haven not done any fine-tuning besides manually flipping/rotating/scaling some of the pictures to make my collection of images less monotonous.

I have followed tensorflows tutorial for object detection with custom data
Training Custom Object Detector — TensorFlow 2 Object Detection API tutorial documentation
with some additional information from other tutorials (mostly from

I´m having i bit of a problem with my data set as the “motive” of the model by nature is very monotonous. I have just over 200 images and i can´t really see how to collect more without collecting more or less copies of existing images.

for the latest model i have trained i have done:

Using Labelimg to annotate pictures (i have two classes).

  • Creating csv-files (training and test) by using the script from the tensorflow tutorial.
    When i create a csv file there is only the image names in the file, not the images path. I have read
    in another tutorial that the full path is needed, do you know is this is needed and if so how to get

    When i create a csv file a get an excel with all information in the first column, should theese be
    separated into a separate cell for every piece of information or does the TFrecord-script (also
    from the tensorflow tutorial) sort this out by itself?

  • Next step is creating a training.record and a test.record with the script from the tensorflow tutorial.

  • After this a train the model using with:

    the pipeline is configured with paths to the model checkpoint, my label_map and my train.record and
    I had to change the batch-number to 2 as i only have 2 gb gpu-ram on my laptop (training a
    mobilenet v1 640X640 from Tensorflow (I feel an upgrade
    coming :slight_smile: )
    Besides this the pipeline.config is as it comes.

Is there anything obvious i´m doing wrong? Where would you say the most common mistakes are made? As providing the full information for the entire process might be a bit overkill my idea is to perhaps looking a bit further into the places where most rookies fail.

Another question, is there a way to relatively easy test the model on a video?

As the images are not mine and could possible hold information that can´t be online i can´t use anything that uploads information to anywhere.

one again, thank you for your help!


I didn’t know that tutorial

I’d suggest you take a look on this one: TensorFlow Lite Model Maker를 사용한 객체 감지

this is my goto basic tutorial on object detection

for testing on a video, what you can do (naive solution as I think there might be some way better way of doing this but I don’t know about it) is use something like ffmpeg, extract every frame of the video and apply your model to it, update the frame with the prediction, recreate a video from the frames