Questions about serving tensorflow saved model locally

Hi. I’m new to learn TensorFlow and DL. I’m working on a application that running on Windows, doing OCR locally.
I’ve learnt something about how to train and test model, but little about deployment the model for production. By now, I just load the trained model and use model.predict(). And then use the PyInstaller to build my codes to a exe package.
The question is, the package is a bit too large. The tensorflow library takes up a lot of space.

I wonder if there is any ways to make it smaller? Should I learn something about TensorFlow Extended or tflite so that they will solve my problem?

Thanks in advanced if anyone could help.

I am using tf.keras to build the model.