Measure size of detection boxes

Hello everybody :), i am currently working on a solution for detecting and measuring the size of a melting pool.
Therefore i have a question:
Is it possible to safe alle the sizes or coordinates of the detection boxes when detecting from a video?

Thank you for helping me out :slight_smile:

That depends on many things.

For example, is the camera on the video always on the same position? Even a slightly change of perspective would change the measuring of any object in the image.

imagine that the zoom of the camera is changed, the measuring will change and for the Object Detection model it doesn’t matter, it will probably return a bigger square but that doesn’t mean that the object is bigger.

Does it makes sense?

Yes it makes sense to me :slightly_smiling_face:
Lets just assume it would be possible to always get the same angle, position and distance of the camera.
How could i implement it into my already trained Tensorflow model? It would be nice to get the detected class (for example Meltpool) and then aswell the size of the object (for example y-axis 12mm and x-axis 10mm).

Thank you for your time :slight_smile:

If you fix everything as mentioned, then you can do some proportions and caculate that based on the the OD model returns

You know that maybe X pixels are close to Y cm and than you calculate.
I’d do that as a post processing step that can be easily calibrated in case something changes

More in general If you have a fully calibrated camera you can project the 4 bounding box from the screen coordinates back into the 3d world but you need to know the Z of each bounding box point that currently it is not estimated by your object detection model.

You could try also to use a stereo camera setup or there are approaches to estimate the monocular depth but you need to check if they could have enought accurancy for your specific use case:

1 Like

@lgusm ok! So you would do the object detection first. And after that you would read out the Box sizes and calculate those with a pixel-metric-ratio to a metric unit?
Could you describe how to read out the box sizes after detecting those? I am completely new to the whole TensorFlow theme.

@Bhack Thanks for your time. I just need the x and y. the depth of the melting pool can be calculated with those informations :slight_smile:

It is not for the depth of the melting pool it is the distance from the camera plane of the bounding box corners.
With a fully calibrated camera you need to know the Z of these points to backproject in the word coordinates (if these points are co-planar the depth is the same for all the corners).

Take a look at:

E.g if you know that all these points are co-planar on the floor you can backproject from the pixel coords to the floor with a fully calibrated camera.

ok i just tried to read both of the Articles you just send me, but i am not quite sure if i understood it right.
Is it possible to get those depth informations from an known object in the picture?
For example: an simple line (co linear to the y axis) which is 5 cm long. If i measure the length in the picture and compare it to the ‘‘real world’’ i should get a factor that i can use to estimate the depth of any given object near to the line right? I mean that there would be some deviations because of the depth effect, but they shouldnt be too high.

Does this make sense or is this wrong?

If you want to do this without a camera calibration your error size will really depend on your camera position in the world, lens, sensor resolution etc…

e.g. With a camera plane quite parallel to the melting pool and with a quite high pixel/cm ratio probably your error It could be low enought (this really depends on your required accurancy).

If you have the camera correctly calibrated (intrinsics + extrinsics) and you know that all the 2d point are almost co-planar in the world you can easly backproject these points in the world coordinates (e.g. 0 on Z) and then measure the real distance between these points on the plane.

@Bhack okay, i guess ill have to try how accurate i can get and what accuracy i need.
Maybe i will have to calibrate the camera for the results i need…

Do you know how i can save the coordinates of the corner points of each box from the detected object?

You can save as you like e.g. json, numpy, pickle etc…

Okay but i dont know where i can put the code to save those boxes… Could you give me an example?
Sorry for asking all those questions, but i am completely new to the whole Tensorflow theme.

You can retrive the coordinates like:

Then It Is just python