Understinding TF Serving for inference

Hi all. I am trying to perform the inference of one image using a QKeras quantized YOLOv4 model. The original code can be found here. However, I have changed the loading model line code to load my quantized model instead of the original one:

        print("Load model...")
        saved_model_loaded = qkeras_utils.load_qmodel('yolo_quantized.h5')
        print("Model loaded!")

        batch_data = tf.constant(images_data) 
        infer = saved_model_loaded.signatures['serving_default']
        pred_bbox = infer(batch_data) #SHOWS AN ERROR

       for key, value in pred_bbox.items():
            boxes = value[:, :, 0:4]
            pred_conf = value[:, :, 4:]

Currenly, I am experiencing this error:

Load model...
Model loaded!
Traceback (most recent call last):
  File "/mnt/beegfs/gap/laumecha/conda-qkeras/tensorflow-yolov4-tflite/detect.py", line 117, in <module>
    app.run(main)
  File "/mnt/beegfs/gap/laumecha/miniconda3/envs/qkeras_env/lib/python3.9/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/mnt/beegfs/gap/laumecha/miniconda3/envs/qkeras_env/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/mnt/beegfs/gap/laumecha/conda-qkeras/tensorflow-yolov4-tflite/detect.py", line 85, in main
    infer = saved_model_loaded.signatures['serving_default']
AttributeError: 'Functional' object has no attribute 'signatures'

I have been trying to learn about the TF serving, and I have understood that I need to convert my model to TF to use the TF serving. However, I assume that converting my QKeras model to the Tensorflow model will raise a QConv layer exception or something similar.

Is there a way to translate this TF serving inference function to one that does not need TF serving? Also, any explanation about the original code (how we can run the inference with this TF serving) is welcome.