I am running a model on a pluggable device I am developing for a new inference device.
When tracing the calls to the pluggable device, I see tensorflow keeping layer intermediate results for longer than they need to be on the device, with bigger models ending up out of device memory.
And just to make it clear, I am talking about intermediate results that are only used once and never more.
Is it expected behvaviour?
Why doesn’t tensorflow free intermediate results as soon as they are used?
In the kernels compute functions, I tend to use
TF_ForwardInputOrAllocateOutput whenever possible or
Should I force using reusing the input by calling
TF_SetOutput on the input