Running multiple inference models in parallel on 1 GPU

ahmadchalhoub99 · April 13, 2022, 5:44pm

Hello,

I am working on a use-case where I need to perform object detection using TensorFlow on multiple (3 or 4) camera streams simultaneously and on one GPU. I can’t seem to find any resources on how doable this might be, except for the following page: tf.config.experimental.set_memory_growth | TensorFlow Core v2.8.0

Could anyone guide me on where to start please?

Thanks,
Ahmad

Kiran_Sai_Ramineni · April 14, 2022, 6:21am

Hi @ahmadchalhoub99 you can consider stacking multiple images into a single array and the inference for all images with GPU.It may help you.For reference please refer this link.Thanks!

ahmadchalhoub99 · April 14, 2022, 4:28pm

Hi @Kiran_Sai_Ramineni

Thank you for your thoughts.
I see how this would work if my goal is to run inference from multiple cameras using the same trained model. But what if I would like to simultaneously run inference using different models (2 or 3 models for examples)?

Thanks.

Kiran_Sai_Ramineni · April 18, 2022, 5:53am

Hi @ahmadchalhoub99 by default, TensorFlow maps all of the GPU memory visible to the process. In order to run multiple models simultaneously you can limit the GPU memory allocated to the process by using tf.config.experimental.set_memory_growth or tf.config.set_logical_device_configuration. But i may increase the time to execute the process. Thanks.