C api tensor pinned memory for faster GPU xfer?

steve · June 23, 2022, 11:15am

I’m working with the c api on an application where I want to maximize inference throughput using a model on a GPU. I continuously have data coming in that is copied to a TF_Tensor which is then run through a TF_SessionRun to get the results. I re-use the same input TF_Tensor object for each call after copying in new data. My model is fairly small, but the Input tensor size is large. I believe a large portion of my TF_SessionRun call consists of copying the input data from the host to the device. (If I profile a python version ~30% of my operation time is spent on “_arg_input_1_0_0/_1:_Recv”)

I know with CUDA you can used pinned memory to help speed up copying from host to device. Does tensorflow do something equivalent by default? Is there something I can do while creating the tensor to pin the memory? Any other thoughts are welcome as well (On Centos with NVIDIA GPU’s).