I’m deploying a big Keras model to production and it’s not very clear to me if I should do anything to it to make it more efficient for inference. In tf-v1 I used to prepare frozen graphs, but this is being deprecated in tf-v2 (as far as I understand).
Right now, I’m just loading the model and using the .predict()
method to perform inference on .tfrec
files.
from tensorflow import keras
model = keras.models.load_model('path/to/location')
model.predict(get_dataset(tfrecord_list, batch_size), verbose=0)
This model won’t be trained anymore, so I wanted to understand if there are any production-specific steps that I should do to increase the model computational performance (if possible).
Thanks!