Blog post about load-testing with TFServing and FastAPI on k8s

deep-diver · July 18, 2022, 4:16am

Hi all!

Excited to share a blog post by me and @Sayak_Paul about comparing load-testing results between TFServing and FastAPI on k8s(GKE). The content is based on the findings from the previous project.

Both load tests are conducted on CPU machines, so we have further optimized the model with ONNX for FastAPI deployment and built TFServing image from source with CPU optimization flags. This blog post shares about our technical setups, considerations, and experimental results comparing the two deployments. You can find much detailed in-depth descriptions of our projects, so please read it here:
Load-testing TensorFlow Serving and FastAPI on GKE | by Chansung Park and Sayak Paul | Google Developers Experts

In short, we have experimented both deployments with various VMs (2vCPU - 8vCPU, 4GB - 64GB RAM), different numbers of pods(8 Nodes - 2 Nodes), different parameters for parallelism (# of uvicorn workers, #of gunicorn workers, # of inter_op_parallelism_threads, # of intra_op_parallelism_threads), and both deployments have their own strengths and weaknesses. So we hope you can get some ideas which deployment might suit to your situations.

Thanks!