MSI
October 11, 2021, 8:51am
#1
I have quantized model with float32. After making tflite model it’s predicting perfectly with single image but when using in while loop it’s showing an error. I tried to follow the instruction of TensorFlow here but didn’t understand their way.
CODE :
def generate_frames(frame):
while True:
image = cv2.resize(frame,(256,256))
#converting into float32
image = tf.image.convert_image_dtype((image/255.0), dtype=tf.float32).numpy()
image = run_inference(np.expand_dims(image[:,:,:3], axis=0))
final_result = (image*255).astype(np.uint8)
ret,buffer=cv2.imencode('.jpg',final_result)
frame=buffer.tobytes()
return frame
#load model
def load_trained_model():
global interpreter, input_details, output_details
interpreter = tf.lite.Interpreter(model_path="quant_model.tflite")
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
def run_inference(image):
# perform inference and parse the outputs
interpreter.set_tensor(input_details[0]['index'], image)
interpreter.invoke()
outputs = interpreter.get_tensor(output_details[0]['index'])[0]
return outputs
if __name__ == '__main__':
load_trained_model()
app.run(debug=True)
ERROR:
RuntimeError: There is at least 1 reference to internal data in the
interpreter in the form of a NumPy array or slice. Be sure to only
hold the function returned from tensor() if you are using raw data
access.
1 Like
Hi @MSI
Maybe tf.expand_dims instead of np.expand_dims?
Also inside generate_frames maybe you want to return image instead of frame?
Best
MSI
October 11, 2021, 1:15pm
#4
@George_Soloupis Edited the part (returning as uint8) and with tf.expand_dims or np.expand_dims nothing changing. Same problem are just happing.
So, without while loop is it working OK?
MSI
October 11, 2021, 2:56pm
#6
@George_Soloupis if i simply use like this ,
cv2.namedWindow("preview")
cap = cv2.VideoCapture(0)
while (True):
_, frame = cap.read()
image = cv2.resize(frame,(256,256))
image = cv2.cvtColor(image , cv2.COLOR_BGR2RGBA)
image = (image /255.0).astype(np.float32)
final_result = run_inference(np.expand_dims(image[:,:,:3], axis=0))
cv2.imshow("preview", final_result)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyWindow("preview")
It’s working fine but when using it in flask it’s showing an error !!
MSI
October 11, 2021, 4:19pm
#8
Just debugged random things! If I use the tflite prediction without any external function directly then it works fine.
def generate_frames(frame):
image = cv2.resize(frame,(256,256))
#converting into float32
image = tf.image.convert_image_dtype((image/255.0), dtype=tf.float32).numpy()
#prediction
-----------------
interpreter = tf.lite.Interpreter(model_path="quant_model.tflite")
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
interpreter.set_tensor(input_details[0]['index'], np.expand_dims(image[:,:,:3], axis=0))
interpreter.invoke()
image = interpreter.get_tensor(output_details[0]['index'])[0]
-----------------
final_result = (image*255).astype(np.uint8)
ret,buffer=cv2.imencode('.jpg',final_result)
frame=buffer.tobytes()
return frame
isn’t it too memory-consuming and a bad way to use a model prediction?
1 Like
Improve from that point that it works…like take out the init of the interpreter
MSI
October 11, 2021, 6:46pm
#10
But why its happing ? why TFLite doesn’t support calling like below?
interpreter = tf.lite.Interpreter(model_path="quant_model.tflite")
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
def generate_frames(frame):
image= cv2.resize(frame,(256,256))
image= cv2.cvtColor(image, cv2.COLOR_BGR2RGBA)
#converting into float32
image= (image/255.0).astype(np.float32)
#prediction
image= run_inference(np.expand_dims(image[:,:,:3], axis=0)) # <<< problem happens here
final_result = (image*255).astype(np.uint8)
ret,buffer=cv2.imencode('.jpg',final_result)
frame=buffer.tobytes()
return frame
def run_inference(image):
# perform inference and parse the outputs
interpreter.set_tensor(input_details[0]['index'], image)
interpreter.invoke()
outputs = interpreter.get_tensor(output_details[0]['index'])[0]
return outputs
Bhack
October 11, 2021, 11:01pm
#11
I suppose it is caused by:
MSI
October 12, 2021, 2:47am
#12
@Bhack Updated the ques title.
I had seen that. The point is we need to have no NumPy arrays pointing to internal buffers, we have to clear them.
Their solutions are reloading the notebook or the model. And both solutions are not worthy in my case.
Bhack
October 12, 2021, 3:16am
#13
Have you checked the internal test:
"""Check that tensor returns a reference."""
array_ref = self.interpreter.tensor(self.input0)
np.copyto(array_ref(), self.initial_data)
self.assertAllEqual(array_ref(), self.initial_data)
self.assertAllEqual(
self.interpreter.get_tensor(self.input0), self.initial_data)
def testGetTensorAccessor(self):
"""Check that get_tensor returns a copy."""
self.interpreter.set_tensor(self.input0, self.initial_data)
array_initial_copy = self.interpreter.get_tensor(self.input0)
new_value = np.add(1., array_initial_copy)
self.interpreter.set_tensor(self.input0, new_value)
self.assertAllEqual(array_initial_copy, self.initial_data)
self.assertAllEqual(self.interpreter.get_tensor(self.input0), new_value)
def testBase(self):
self.assertTrue(self.interpreter._safe_to_run())
_ = self.interpreter.tensor(self.input0)
self.assertTrue(self.interpreter._safe_to_run())
in0 = self.interpreter.tensor(self.input0)()
MSI
October 12, 2021, 8:43am
#15
@Bhack Thanks for the source. As far I understood we need to delete internal buffer after each iteration. From the interpreter_test.py we need to perform “del in0” operation. But I am confused about how to perform it? Can you give me a hint?
interpreter = tf.lite.Interpreter(model_path="quant_model.tflite")
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
def run_inference(image):
# perform inference and parse the outputs
interpreter.set_tensor(input_details[0]['index'], image)
interpreter.invoke()
outputs = interpreter.get_tensor(output_details[0]['index'])[0]
**I think, need to perform the buffer delete operation here** (but how ?)
return outputs
Bhack
October 12, 2021, 2:01pm
#17
If you see many of these operation has a safety guard, you can find the check description here:
"""Returns true if there exist no numpy array buffers.
This means it is safe to run tflite calls that may destroy internally
allocated memory. This works, because in the wrapper.cc we have made
the numpy base be the self._interpreter.
"""
# NOTE, our tensor() call in cpp will use _interpreter as a base pointer.
# If this environment is the only _interpreter, then the ref count should be
# 2 (1 in self and 1 in temporary of sys.getrefcount).
return sys.getrefcount(self._interpreter) == 2
def _ensure_safe(self):
"""Makes sure no numpy arrays pointing to internal buffers are active.
This should be called from any function that will call a function on
_interpreter that may reallocate memory e.g. invoke(), ...
Raises:
RuntimeError: If there exist numpy objects pointing to internal memory
then we throw.
This file has been truncated. show original
I don’t think that the problem is on set_tensor
and get_tensor
as they are the slow (copy) API instead of tensor()
.
Have you tried if holding input_details
and output_details
is going to be similar to the WRONG pattern explained at:
https://www.tensorflow.org/api_docs/python/tf/lite/Interpreter#wrong_2
This could also clarify why probably it was working when you tried with all the code gist in a single function as these references were confined to the function scope.
Bhack
October 12, 2021, 2:49pm
#18
@MSI Can you modify this very minimal Colab gist for your use case as I cannot reproduce your error in this minimal context:
Bhack
October 12, 2021, 9:01pm
#20
I’ve just comment the GPU lines as I don’t have a spare GPU currently:
physical_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)
And uncommented:
# interpreter = tf.lite.Interpreter(model_path="quant_model.tflite")
# interpreter.allocate_tensors()
# input_details = interpreter.get_input_details()
# output_details = interpreter.get_output_details()
But I don’t see any error message with TF 2.6
MSI
October 13, 2021, 3:02am
#21
@Bhack Did you commented the lines in run_inference() . I updated the github. If you run it now you will see the error.
def run_inference(image):
# interpreter = tf.lite.Interpreter(model_path="quant_model.tflite")
# interpreter.allocate_tensors()
# input_details = interpreter.get_input_details()
# output_details = interpreter.get_output_details()
# perform inference and parse the outputs
interpreter.set_tensor(input_details[0]['index'], image)
interpreter.invoke()
outputs = interpreter.get_tensor(output_details[0]['index'])[0]
return outputs
Bhack
October 13, 2021, 3:25am
#22
Oh now I see, I suppose that the problem is that your thinking about a standard python file but this is not the same in flask.
You need to use something like this to “store” your global objects in the app context (interpeter, input_details, output_details):
python, flask
Bhack
October 13, 2021, 4:09am
#23
P.s. If it is still slow as you need to load and recreate the interpreter as its lifecycle end on each request you could try to run TF Serving instance and consume it with Flask:
https://medium.com/analytics-vidhya/serving-ml-with-flask-tensorflow-serving-and-docker-compose-fe69a9c1e369
2 Likes
MSI
October 13, 2021, 10:18am
#25
@Bhack That’s a great hint! It worked.
def run_inference(image):
g.interpreter.set_tensor(g.input_details[0]['index'], image)
g.interpreter.invoke()
outputs = g.interpreter.get_tensor(g.output_details[0]['index'])[0]
return outputs
@app.before_request
def load_model():
g.interpreter = tf.lite.Interpreter(model_path="quant_model.tflite")
g.interpreter.allocate_tensors()
g.input_details = g.interpreter.get_input_details()
g.output_details = g.interpreter.get_output_details()
But truly said it seems, taking time as same as loading model every time.
Bhack
October 13, 2021, 11:16am
#26
Yes It Is better that you interface flask with the TF serving instance as in the example.