Tf-serving with tensorrt seems compress batchs to 1

Arashi · July 2, 2021, 2:35am

I have a model can be successfully run tenorflow-serving. Then I covert it with commond saved_model_cli， below is detail command line:

docker run --rm --user 3004 --gpus all -it \
    -v /path/to/tensorflow_serving:/work/tf_model \
    -e CUDA_VISIBLE_DEVICES=1 \
    harbor.private.com/dev/tf:1.15.5-gpu /usr/local/bin/saved_model_cli convert \
    --dir /work/tf_model/buyer_sent_model_pb_02/01 \
    --output_dir /work/tf_model/buyer_sent_model_trt/02 \
    --tag_set serve \
    tensorrt --precision_mode FP32 --max_batch_size 16 --is_dynamic_op True

Then I serve it with tensorflow-serving, command line:

docker run -d --gpus all -p 8501:8501 --mount type=bind,source=/path/to/tensorflow_serving/my_model_dir,target=/models/my_model_dir \
-e MODEL_NAME=my_model_name -e CUDA_VISIBLE_DEVICES=1 \
-e TF_FORCE_GPU_ALLOW_GROWTH='true' \
-t harbor.private.com/dev/tf-serving:2.4.1-gpu

my input:

{
    "inputs": {
             "Input-Token": data1,
             "Input-Segment": data2
        }
}

data1 and data2 are both lists, length is 16.

data1:

[
    [101, 3766, 752, 8024, 6814, 3341, 6760, 6760, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2218, 3221, 8238, 697, 1259, 1408, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2769, 6206, 743, 2643, 5948, 1947, 6163, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 930, 702, 6963, 3221, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2218, 3221, 2769, 6821, 6804, 791, 1921, 1157, 2802, 2458, 3341, 4500, 749, 671, 833, 6230, 2533, 679, 1916, 3265, 102, 0, 0, 0],
    [101, 2769, 3221, 6206, 2864, 4706, 5296, 3890, 5011, 4638, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 1962, 4638, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 1355, 749, 1557, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 6843, 3819, 4706, 3344, 1408, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 872, 1962, 6435, 7309, 6821, 702, 743, 671, 6843, 671, 3221, 2582, 720, 702, 6843, 3791, 102, 0, 0, 0, 0, 0, 0, 0],
    [101, 1119, 3247, 2458, 1993, 8043, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 6716, 7770, 8725, 8175, 1408, 8043, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2769, 743, 749, 6821, 702, 121, 119, 8146, 4638, 4385, 1762, 4684, 2970, 4802, 6371, 3119, 6573, 2218, 1377, 809, 749, 511, 1968, 102],
    [101, 4692, 1168, 928, 2622, 1726, 1908, 678, 1521, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 155, 4772, 7027, 7481, 1377, 809, 3022, 679, 6585, 6716, 4638, 3688, 6132, 720, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2571, 6853, 4157, 3766, 3300, 2571, 6853, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
]

data2:

[
    [0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1],
    [1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
    [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
    [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1],
    [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0],
    [0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1],
    [1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0],
    [0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1],
    [0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0],
    [0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1],
    [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0],
    [0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0],
    [1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
]

This set of data works fine.

But when change data2 to:

[
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
]

This set of data1 and data2 run into troubles.

On the server side it has log as below:

2021-07-01 08:29:53.363285: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:587] Running native segment forTRTEngineOp_26 due to failure in verifying input shapes: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_26: [[16,25,768], [1,25,768]]
2021-07-01 08:29:58.734463: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:587] Running native segment forTRTEngineOp_26 due to failure in verifying input shapes: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_26: [[16,25,768], [1,25,768]]
2021-07-01 08:29:58.863914: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:587] Running native segment forTRTEngineOp_26 due to failure in verifying input shapes: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_26: [[16,30,768], [1,30,768]]

on the client, I got:

{'error': 'Timed out waiting for notification'}

It seems tensorflow compresses the data2 from a list of 16 length to 1 length?

What is the problem in my case， do I miss someting?

Environment

Nvidia Driver Version : 455.38 in Host
GPU Type : 2080ti, both convert and serving

tensorflow:1.15.5-gpu for converting
tensorflow-serving: 2.4.1-gpu for serving
both docker is pulled from offical site in docker hub