Intermediate results kept for longer than needed on device

slai-nick · November 10, 2022, 8:50am

Hi,

I am running a model on a pluggable device I am developing for a new inference device.
When tracing the calls to the pluggable device, I see tensorflow keeping layer intermediate results for longer than they need to be on the device, with bigger models ending up out of device memory.
And just to make it clear, I am talking about intermediate results that are only used once and never more.

Is it expected behvaviour?
Why doesn’t tensorflow free intermediate results as soon as they are used?

In the kernels compute functions, I tend to use TF_ForwardInputOrAllocateOutput whenever possible or TF_AllocateOutput otherwise.
Should I force using reusing the input by calling TF_SetOutput on the input TF_Tensor *?