Input Forwarding not happening

Working on a pluggable device, I am using TF_ForwardInputOrAllocateOutput in some of my kernels but I don’t see the effect of it in practice.

Here is a simple example:

import tensorflow as tf

def run_graph(model, x):
    return model(x)

shape = [2, 2]
x = tf.keras.Input(shape, batch_size=1)
y = x
y = tf.keras.layers.Add()([y, y])
y = tf.keras.layers.ReLU()(y)
y = tf.keras.layers.Add()([y, y])
y = tf.keras.layers.ReLU()(y)

model = tf.keras.Model(inputs=x, outputs=y)
x = tf.constant(
        [1, 2],
        [3, 4],

y = run_graph(model, x)

On the pluggin side, I put the device memory addresses of the each input/output tensors in each kernel:

Op:     Cast (   1) Output: mem0005  Input(s): mem0004
Op:      Mul (   1) Output: mem0006  Input(s): mem0002 mem0005
Op:     Relu (   1) Output: mem0007  Input(s): mem0006
Op:      Mul (   2) Output: mem0008  Input(s): mem0007 mem0003
Op:     Relu (   2) Output: mem0009  Input(s): mem0008

What I would expect is to see the output address of each operation (except the first one to not override x) being one of its inputs ones since I use forwarding in each of these operations.