Running multiple inference models in parallel on 1 GPU


I am working on a use-case where I need to perform object detection using TensorFlow on multiple (3 or 4) camera streams simultaneously and on one GPU. I can’t seem to find any resources on how doable this might be, except for the following page: tf.config.experimental.set_memory_growth  |  TensorFlow Core v2.8.0

Could anyone guide me on where to start please?


Hi @ahmadchalhoub99 you can consider stacking multiple images into a single array and the inference for all images with GPU.It may help you.For reference please refer this link.Thanks!

Hi @Kiran_Sai_Ramineni

Thank you for your thoughts.
I see how this would work if my goal is to run inference from multiple cameras using the same trained model. But what if I would like to simultaneously run inference using different models (2 or 3 models for examples)?


Hi @ahmadchalhoub99 by default, TensorFlow maps all of the GPU memory visible to the process. In order to run multiple models simultaneously you can limit the GPU memory allocated to the process by using tf.config.experimental.set_memory_growth or tf.config.set_logical_device_configuration. But i may increase the time to execute the process. Thanks.