I’m trying to quickly load a model to make predictions in a REST API. The tf.keras.models.load_model method takes ~1s to load so it’s too slow for what I’m trying to do.
What is the fastest way to load a model for inference only?
I know there is a TFX serving server to just do this efficiently but I’ve already have a REST API for doing other things. Setting up a specialised server just for predictions feels like an overkill. How is the TFX server handling this?
Thanks in advance,