How to properly deploy Keras models for inference in Python?

apcamargo · March 24, 2022, 1:27am

I’m deploying a big Keras model to production and it’s not very clear to me if I should do anything to it to make it more efficient for inference. In tf-v1 I used to prepare frozen graphs, but this is being deprecated in tf-v2 (as far as I understand).

Right now, I’m just loading the model and using the .predict() method to perform inference on .tfrec files.

from tensorflow import keras
model = keras.models.load_model('path/to/location')
model.predict(get_dataset(tfrecord_list, batch_size), verbose=0)

This model won’t be trained anymore, so I wanted to understand if there are any production-specific steps that I should do to increase the model computational performance (if possible).

Thanks!

Kiran_Sai_Ramineni · March 25, 2022, 8:35am

Hi @apcamargo please refer to this documentation to know how to deploy sever you model using TensorFlow serving.

apcamargo · March 25, 2022, 3:55pm

HI @Kiran_Sai_Ramineni. TensorFlow serving is aimed towards web services, right? My model will be distributed within a Python package, so I’m not sure if that’s the way to go

Bhack · March 25, 2022, 5:07pm

Other then optimizing the model itself you can try to jit_compile your model:

apcamargo · March 29, 2022, 9:43pm

Thanks, @Bhack!. jit_compile doesn’t work with my model. I still haven’t figured out why.

Bhack · March 29, 2022, 9:54pm

Generally you have a message in the log about something not supported.

apcamargo · March 30, 2022, 11:34pm

I’m getting a InvalidArgumentError error.

InvalidArgumentError: Graph execution error:

Detected at node 'StatefulPartitionedCall' defined at (most recent call last):
…

I’m not posting the whole log here because that’s out of topic. I’m using a complex custom layer that is probably causing this. I’ll try to figure out the root of the problem.

Bhack · March 31, 2022, 12:05am

You can limit the compilation scope on critical functions with @tf.function(jit_compile=True):

But I suggest also to profile your model to understand what happens: