Error when using TFLite interpreter in Flask

I have quantized model with float32. After making tflite model it’s predicting perfectly with single image but when using in while loop it’s showing an error. I tried to follow the instruction of TensorFlow here but didn’t understand their way.

CODE:

def generate_frames(frame):
    while True:

        image = cv2.resize(frame,(256,256))

        #converting into float32
        image = tf.image.convert_image_dtype((image/255.0), dtype=tf.float32).numpy()

        image = run_inference(np.expand_dims(image[:,:,:3], axis=0)) 

        final_result = (image*255).astype(np.uint8)
        
        ret,buffer=cv2.imencode('.jpg',final_result)

        frame=buffer.tobytes()

        return frame


#load model
def load_trained_model():
    global interpreter, input_details, output_details
    interpreter = tf.lite.Interpreter(model_path="quant_model.tflite")
    interpreter.allocate_tensors()
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()


def run_inference(image):
    # perform inference and parse the outputs
    interpreter.set_tensor(input_details[0]['index'], image)
    interpreter.invoke()
    outputs = interpreter.get_tensor(output_details[0]['index'])[0]

    return outputs

if __name__ == '__main__':
    load_trained_model()    
    app.run(debug=True)

ERROR:

RuntimeError: There is at least 1 reference to internal data in the
interpreter in the form of a NumPy array or slice. Be sure to only
hold the function returned from tensor() if you are using raw data
access.

1 Like

Hi @MSI

Maybe tf.expand_dims instead of np.expand_dims?
Also inside generate_frames maybe you want to return image instead of frame?

Best

@George_Soloupis Edited the part (returning as uint8) and with tf.expand_dims or np.expand_dims nothing changing. Same problem are just happing.

So, without while loop is it working OK?

@George_Soloupis if i simply use like this ,

cv2.namedWindow("preview")
cap = cv2.VideoCapture(0)
while (True):
    _, frame = cap.read()
    image  = cv2.resize(frame,(256,256)) 
    image  = cv2.cvtColor(image , cv2.COLOR_BGR2RGBA)
    image  = (image /255.0).astype(np.float32)
    final_result  = run_inference(np.expand_dims(image[:,:,:3], axis=0))
    cv2.imshow("preview", final_result)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
cap.release()
cv2.destroyWindow("preview")

It’s working fine but when using it in flask it’s showing an error !!

Just debugged random things! If I use the tflite prediction without any external function directly then it works fine.

def generate_frames(frame):
    image = cv2.resize(frame,(256,256))

    #converting into float32
    image = tf.image.convert_image_dtype((image/255.0), dtype=tf.float32).numpy()
    
    #prediction
    -----------------
    interpreter = tf.lite.Interpreter(model_path="quant_model.tflite")
    interpreter.allocate_tensors()
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()

    interpreter.set_tensor(input_details[0]['index'], np.expand_dims(image[:,:,:3], axis=0))
    interpreter.invoke()
    image = interpreter.get_tensor(output_details[0]['index'])[0]
    -----------------
    
    final_result = (image*255).astype(np.uint8)
        
    ret,buffer=cv2.imencode('.jpg',final_result)

    frame=buffer.tobytes()

    return frame

isn’t it too memory-consuming and a bad way to use a model prediction?

1 Like

Improve from that point that it works…like take out the init of the interpreter

But why its happing ? why TFLite doesn’t support calling like below?

interpreter = tf.lite.Interpreter(model_path="quant_model.tflite")
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

def generate_frames(frame):

    image= cv2.resize(frame,(256,256))
    image= cv2.cvtColor(image, cv2.COLOR_BGR2RGBA)

    #converting into float32
    image= (image/255.0).astype(np.float32)

    #prediction
    image= run_inference(np.expand_dims(image[:,:,:3], axis=0)) # <<< problem happens here

            
    final_result = (image*255).astype(np.uint8)
            
    ret,buffer=cv2.imencode('.jpg',final_result)

    frame=buffer.tobytes()

    return frame



def run_inference(image):
    # perform inference and parse the outputs
    interpreter.set_tensor(input_details[0]['index'], image)
    interpreter.invoke()
    outputs = interpreter.get_tensor(output_details[0]['index'])[0]
    return outputs

I suppose it is caused by:

@Bhack Updated the ques title.

I had seen that. The point is we need to have no NumPy arrays pointing to internal buffers, we have to clear them.

Their solutions are reloading the notebook or the model. And both solutions are not worthy in my case.

Have you checked the internal test:

@Bhack Thanks for the source. As far I understood we need to delete internal buffer after each iteration. From the interpreter_test.py we need to perform “del in0” operation. But I am confused about how to perform it? Can you give me a hint?

interpreter = tf.lite.Interpreter(model_path="quant_model.tflite")
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

def run_inference(image):
    # perform inference and parse the outputs
    interpreter.set_tensor(input_details[0]['index'], image)
    interpreter.invoke()
    outputs = interpreter.get_tensor(output_details[0]['index'])[0]

    **I think, need to perform the buffer delete operation here** (but how ?)

    return outputs

If you see many of these operation has a safety guard, you can find the check description here:

I don’t think that the problem is on set_tensor and get_tensor as they are the slow (copy) API instead of tensor().

Have you tried if holding input_details and output_details is going to be similar to the WRONG pattern explained at:

https://www.tensorflow.org/api_docs/python/tf/lite/Interpreter#wrong_2

This could also clarify why probably it was working when you tried with all the code gist in a single function as these references were confined to the function scope.

@MSI Can you modify this very minimal Colab gist for your use case as I cannot reproduce your error in this minimal context:

I’ve just comment the GPU lines as I don’t have a spare GPU currently:

physical_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)  

And uncommented:

# interpreter = tf.lite.Interpreter(model_path="quant_model.tflite")
# interpreter.allocate_tensors()
# input_details = interpreter.get_input_details()
# output_details = interpreter.get_output_details()

But I don’t see any error message with TF 2.6

@Bhack Did you commented the lines in run_inference() . I updated the github. If you run it now you will see the error.

def run_inference(image):
#     interpreter = tf.lite.Interpreter(model_path="quant_model.tflite")
#     interpreter.allocate_tensors()
#     input_details = interpreter.get_input_details()
#     output_details = interpreter.get_output_details()
    # perform inference and parse the outputs
    interpreter.set_tensor(input_details[0]['index'], image)
    interpreter.invoke()
    outputs = interpreter.get_tensor(output_details[0]['index'])[0]

    return outputs

Oh now I see, I suppose that the problem is that your thinking about a standard python file but this is not the same in flask.

You need to use something like this to “store” your global objects in the app context (interpeter, input_details, output_details):

P.s. If it is still slow as you need to load and recreate the interpreter as its lifecycle end on each request you could try to run TF Serving instance and consume it with Flask:

https://medium.com/analytics-vidhya/serving-ml-with-flask-tensorflow-serving-and-docker-compose-fe69a9c1e369

2 Likes

@Bhack That’s a great hint! It worked.

def run_inference(image):
    g.interpreter.set_tensor(g.input_details[0]['index'], image)
    g.interpreter.invoke()
    outputs = g.interpreter.get_tensor(g.output_details[0]['index'])[0]
    return outputs

@app.before_request
def load_model():
    g.interpreter = tf.lite.Interpreter(model_path="quant_model.tflite")
    g.interpreter.allocate_tensors()
    g.input_details = g.interpreter.get_input_details()
    g.output_details = g.interpreter.get_output_details()

But truly said it seems, taking time as same as loading model every time.

Yes It Is better that you interface flask with the TF serving instance as in the example.