Hello everybody :), i am currently working on a solution for detecting and measuring the size of a melting pool.
Therefore i have a question:
Is it possible to safe alle the sizes or coordinates of the detection boxes when detecting from a video?
For example, is the camera on the video always on the same position? Even a slightly change of perspective would change the measuring of any object in the image.
imagine that the zoom of the camera is changed, the measuring will change and for the Object Detection model it doesn’t matter, it will probably return a bigger square but that doesn’t mean that the object is bigger.
Yes it makes sense to me
Lets just assume it would be possible to always get the same angle, position and distance of the camera.
How could i implement it into my already trained Tensorflow model? It would be nice to get the detected class (for example Meltpool) and then aswell the size of the object (for example y-axis 12mm and x-axis 10mm).
More in general If you have a fully calibrated camera you can project the 4 bounding box from the screen coordinates back into the 3d world but you need to know the Z of each bounding box point that currently it is not estimated by your object detection model.
You could try also to use a stereo camera setup or there are approaches to estimate the monocular depth but you need to check if they could have enought accurancy for your specific use case:
@lgusm ok! So you would do the object detection first. And after that you would read out the Box sizes and calculate those with a pixel-metric-ratio to a metric unit?
Could you describe how to read out the box sizes after detecting those? I am completely new to the whole TensorFlow theme.
@Bhack Thanks for your time. I just need the x and y. the depth of the melting pool can be calculated with those informations
It is not for the depth of the melting pool it is the distance from the camera plane of the bounding box corners.
With a fully calibrated camera you need to know the Z of these points to backproject in the word coordinates (if these points are co-planar the depth is the same for all the corners).
ok i just tried to read both of the Articles you just send me, but i am not quite sure if i understood it right.
Is it possible to get those depth informations from an known object in the picture?
For example: an simple line (co linear to the y axis) which is 5 cm long. If i measure the length in the picture and compare it to the ‘‘real world’’ i should get a factor that i can use to estimate the depth of any given object near to the line right? I mean that there would be some deviations because of the depth effect, but they shouldnt be too high.
If you want to do this without a camera calibration your error size will really depend on your camera position in the world, lens, sensor resolution etc…
e.g. With a camera plane quite parallel to the melting pool and with a quite high pixel/cm ratio probably your error It could be low enought (this really depends on your required accurancy).
If you have the camera correctly calibrated (intrinsics + extrinsics) and you know that all the 2d point are almost co-planar in the world you can easly backproject these points in the world coordinates (e.g. 0 on Z) and then measure the real distance between these points on the plane.