Hi, thank you for taking the time to answer.
I am making a plugin for a new type of hardware accelerator.
Currently I am not talking directly to the accelerator but instead the kernels send instructions to a simulation via RPC - there I can trace anything happening.
What I am observing running a SavedModel with the plugin, I get a meaningful trace. Here is the script:
import tensorflow as tf
shape = (1, 32, 32, 3)
model = keras.models.load_model('./model_zoo/retinanet_resnet50_v1_fpn_640x640_1')
x = tf.random.uniform(shape)
y = model(x)
The trace output from the RPC client looks like:
(...)
Send op Relu ( 3) Inputs: mem05eb Outputs: mem05ee
Send op Conv2D ( 4) Inputs: mem05ee mem05ef Outputs: mem05f0
Send op BiasAdd ( 4) Inputs: mem05f0 mem05f1 Outputs: mem05f2
Send op Conv2D ( 5) Inputs: mem05d6 mem05f3 Outputs: mem05f4
Send op BiasAdd ( 5) Inputs: mem05f4 mem05f5 Outputs: mem05f6
Send op FusedBat ( 4) Inputs: mem05f6 mem05f7 mem05f8 mem05f9 mem05fa Outputs: mem05fb mem05fc mem05fd
Send op FusedBat ( 5) Inputs: mem05f2 mem05fe mem05ff mem0600 mem0601 Outputs: mem0602 mem0603 mem0604
Send op AddV2 ( 55) Inputs: mem05fb mem0602 Outputs: mem0605
(...)
I see a bunch of Fill
and other operations but I pass the details.
When running a hub model like in this script:
import tensorflow as tf
import tensorflow_hub as hub
model = hub.load('https://tfhub.dev/tensorflow/retinanet/resnet50_v1_fpn_640x640/1')
shape = (1, 32, 32, 3)
input = tf.random.normal(shape)
y = model(input)
I see no Conv2D
…
It starts with a bunch of Identity
, AssignVariableOp
, then one Mul
and one AddV2
and that’s it.
(...)
Send op Identity ( 460) Inputs: mem03a6 Outputs: mem03a7
(...)
RandomStandardNormal in device /job:localhost/replica:0/task:0/device:CPU:0
Send op Mul ( 1) Inputs: mem03b0 mem03af Outputs: mem03b1
Send op AddV2 ( 1) Inputs: mem03b1 mem03ae Outputs: mem03b2
The kernels registered in the plugin sends instructions directly to the RPC client.
What I would expect to see from the second test is the same trace (seeing the Conv2D
, BiasAdd
, etc) as the first one.
I might just be using the hub api wrong.
Side not about the environment - I am runing everything in an ARM docker container (I developping on apple silicon), the image is the latest one from Docker Hub, it is using tensorflow 2.8.
It shouldn’t be the issue though.