Embeding the TensorFlow model vs asking to TensorFlow serving server

Hi, I want to use TensorFlow model in my application server.
I have to decide how I use it.

[My question]
My questions is whether “Asking TensorFlow Serving server is always better than Embedding the TensorFlow model in my application server”

[In detail]
I found that there are two ways to use TensorFlow model in Production environments.

(1) Load the saved model in my application server. And use it for inferencing in my application server

(2) Make the TensorFlow serving server such as TensorFlow Serving. And Make my application server use the TensorFlow Serving Server.

When I google it, I found most people use the way (2)

I thought its because there are some advantages for the way (2)

a. Acceleration by the dedicated hardware
b. Easily constructing CI/CD TensorFlow models by leveraging features of TensorFlow Serving
c. Easily scaled out when there are not enough resources at TensorFlow Serving server.
d. … and so on.

[My question agian]
But I think (1) is also good because (2) makes additional network latencies although (2) can use some advanced networking such gRPC.

Could you share your experience for using TensorFlow Model in Production environment?

1 Like