Error when using TFLite interpreter in Flask

@Bhack As you suggested, I tried to follow your provided TF-serving technique. But faced some problem there. But from another website implemented the TF-Serving with flask. As far as I learnt we can’t use tflite in TF-serving. As a result I converted my h5(size was 42.0 MB) to pb(and required format). Which worked fine. But still slow in process. Do you think my PC needs to be more stronger ?

NB: Current PC config 8gb ram, 1TB HDD, 4gb(grapics) !

Modified the frame generate code for prediction like below. Is there any problem ?

def generate_frames(frame):

    img_face = cv2.resize(frame,(256,256))    
    img_face = cv2.cvtColor(img_face, cv2.COLOR_BGR2RGBA)

    #converting into float32
    img_face_f = (img_face/255.0).astype(np.float32)
    img = img_face_f[:,:,:3]

    payload = {
        "instances": [{'input_1': img.tolist()}]
    }

    r = requests.post('http://localhost:8501/v1/models/model_name:predict', json=payload)
    
    mask= json.loads(r.content.decode('utf-8'))

    mask= np.array(mask['predictions'])[0]
            
    final_result = (mask*255).astype(np.uint8)
            
    ret,buffer=cv2.imencode('.jpg',final_result)

    frame=buffer.tobytes()

    return frame

Yes it is correct

Do you think my PC need to be more stronger ?

What is you GPU?

NVIDIA- GeForce-940MX (4 GB DDR3 dedicated)

Do you have followed the TF serving for GPU steps?

Also if it is running correctly on GPU this specific model could be still relatively slow if your if your model is too heavy. See:

Yes, I think ! I tried with tensorflow/serving:latest-gpu image.

check with nvidia-smi that your GPU is occupied.

@Bhack I think you got the right point !! Somehow it’s not utilizing my GPU !!!

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 471.41       Driver Version: 471.41       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ... WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A    0C    P8    N/A /  N/A |     40MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

After running the program! but what’s wrong with it !!

I think I missed the nvidia docker point !

1 Like

@Bhack Apart from this I saw when I am using tflite normally in my PC it’s not utilizing the GPU but the normal model does. What’s the point here?

Yes TFlite has not a Nvidia/CUDA GPU delegate currently.
On that GPU you need to use regular TF. See:

@Bhack Thinking about hair segment of mediapipe. But it’s available in Android & C++… Is there any way to use it in python ?

With serving you need to use a “regular” TF model:

It is probably exoerimental but If you need a specific TFlite model you could try to conver your model with:

Then probably you could write your own service with TF.js node GPU:

@Bhack Thanks for all the suggestions. Can you please take a look on this issue? Thank you.

Isn’t that one the same issue?

No, this time normal model is working fine but when predicting for the first time after starting server it takes time !!

@Bhack Going through some confusion. Almost in every article, we can see they recommend the TF serving for deployment but when I should avoid it?

You have Tensorflow serving or you can experiment with TF.js node

@Bhack @George_Soloupis Any solution which helps in: Flask/FastAPI serving (cannot reload/refresh model)? Please help;

Tried removing all the possible references of objects related to interpreter (input_details, output_details) etc;