Clarification on when kernels should allocate new tensors, try to forward input tensors or straight up reuse them


I would like some clarification on how/when/why tensors should be allocated when implementing plugin kernels.

The API provided has:

  • TF_NewTensor
  • TF_AllocateTensor
  • TF_AllocateOutput
  • TF_SetOutput
  • TF_ForwardInputOrAllocateOutput

It seems that TF_ForwardInputOrAllocateOutput is the way to handle all needs, however I’ve seen it constantly allocating instead of forwarding in cases where it was trivially “forwardable” (input → Reshape → Dense: the Reshape can forward instead of allocating).

What are the general guidelines for kernel implementations? What is possible/not possible/etc?
Should all kernels allocate a new output tensor? Is it allowed for kernels to reuse the input tensor by just using TF_SetOutput on it? How does TF_ForwardInputOrAllocateOutput decides when to forward and when to allocate? Why don’t I see it forwarding the input on a simple example like the one I mentioned?