Tensorflow Serving how to filter output?

Hello all.

I’m using SSD MobileNet V2 320x320 with Tensorflow Serving:latest-gpu on Nvidia 1060 6GB card
Sending a lot of requests by REST API

output_body = {"inputs": [{"b64": input_image}]}

latency: ~1966.1445617675781 ms

And from output i get a lot of data eash request (17mb each!): detection_multiclass_scores, detection_anchor_indices, raw_detection_scores, detection_scores, raw_detection_boxes, detection_boxes, detection_classes etc. And all of this arrays are LEN 100

How can i cut out my output from Tensorflow Serving, filter it?

I was trying a lot of combinations, like:

output_body = {"inputs": [{"b64": input_image}], "outputs": [{"detection_scores": False}], "output_filter": 'detection_classes'}

But nothing works.

I found these parameters at: serving/predict.proto at 5369880e9143aa00d586ee536c12b04e945a977c · tensorflow/serving · GitHub

So can i make 10 or 5 or 20 ouput LEN? Or don’t get some output parameters like raw_detection_scores?


Hello again, i found one solution. In pipeline.config i changed this line:

max_number_of_boxes: 50

And retrain model. But anyway latency is hight:

  1. Local model loaded by tf.saved_model.load ~110 ms (60ms loading img, 1-5ms convert to tensor, 60ms prediction
  2. Local model loaded by Tensorflow Serving in Docker: ~1100 ms total

I will continue research. Cause it will be SUPER solution to have only 1 Server with model and 10 PCs getting results on 110ms latence. 1100 latency is to high