Hangs when using the ModifyGraphWithDelegate() function for option GPU delegate

Hello, everybody!
I use ModifyGraphWithDelegate() function for acceleration with GPU. When enabling the application the first is it works fine, after disabling the application then is two issues happen:
+> Memory not deleted, I suspect that GPU memory not delete. But currently, I haven’t yet found any way to check that.
+> Enable the again application then the application hangs when the interpreter call Invoke() function.
Does anyone know why?
Thank a lot!
Below is a snippet code of my application:

int main()
{
	unique_ptrtflite::FlatBufferModel m_model;
	unique_ptrtflite::Interpreter interpreter;
	string modelFileName = “Path to model.tflie”;
	m_model = tflite::FlatBufferModel::BuildFromFile(modelFileName.c_str());
	if (m_model == nullptr) {
	fprintf(stderr, “Failed to load model\n”);
	exit(EXIT_FAILURE);

	TfLiteGpuDelegateOptionsV2 options = TfLiteGpuDelegateOptionsV2Default();

	options.inference_priority1 = TFLITE_GPU_INFERENCE_PRIORITY_MIN_LATENCY;
	options.inference_priority2 = TFLITE_GPU_INFERENCE_PRIORITY_AUTO;
	options.inference_priority3 = TFLITE_GPU_INFERENCE_PRIORITY_AUTO;
	options.experimental_flags |= TFLITE_GPU_EXPERIMENTAL_FLAGS_ENABLE_QUANT;
	options.inference_preference = TFLITE_GPU_INFERENCE_PREFERENCE_FAST_SINGLE_ANSWER;
	options.experimental_flags |= TFLITE_GPU_EXPERIMENTAL_FLAGS_CL_ONLY;

	auto theGpuDelegate = tflite::Interpreter::TfLiteDelegatePtr(TfLiteGpuDelegateV2Create(&options), TfLiteGpuDelegateV2Delete);

	tflite::ops::builtin::BuiltinOpResolver resolver;
	tflite::InterpreterBuilder(*m_model.get(), resolver)(&interpreter);

	if(interpreter->ModifyGraphWithDelegate(theGpuDelegate.get()) != kTfLiteOk) throw std::runtime_error(“Fail modify graph with GPU delegate”);
	
	while(1) {
		screenImage = cv::imread(imageFilePath);
		memcpy(interpreter->typed_input_tensor<float>(0), screenImage.data, screenImage.total() * screenImage.elemSize());
		
		interpreter->Invoke();
		
		TfLiteTensor* scores = interpreter->output_tensor(0);
		int rows = scores->dims->data[1];
		int colums = scores->dims->data[2];
		
		float* data_scores = new float[rows*colums];
		float* data_geometry = new float[5*rows*colums];
		
		memcpy(data_scores, interpreter->typed_output_tensor<float>(0), rows*colums*sizeof(float));
		memcpy(data_geometry, interpreter->typed_output_tensor<float>(1), 5*rows*colums*sizeof(float));
		
		// Decoding data
		decode(data_scores, data_geometry, rows, colums);
		delete[] data_scores;
		delete[] data_geometry;		
	}

	return 0;
};

Deployment environment:

  • TF-Lite version: 2.65
  • OS: ubuntu 18.04
  • Chip: ARM-(GPU) Mali-G series.