The ModifyGraphWithDelegate() function in TF-lite takes too much time

Hello everybody!
When I use ModifyGraphWithDelegate() function for acceleration with GPU, I see it take up too much time about 3-4s. Is there any way to replace the ModifyGraphWithDelegate() function or reduce the time when using the ModifyGraphWithDelegate() function?
I used :

  • TF-Lite 2.65

  • OS: ubuntu 18.04

  • Chip: ARM-(GPU) Mali-G series
    This is my snippet code:
    {
    unique_ptrtflite::FlatBufferModel m_model;
    unique_ptrtflite::Interpreter interpreter;
    string modelFileName = “Path to model.tflie”;
    m_model = tflite::FlatBufferModel::BuildFromFile(modelFileName.c_str());
    if (m_model == nullptr) {
    fprintf(stderr, “Failed to load model\n”);
    exit(EXIT_FAILURE);
    }

    TfLiteGpuDelegateOptionsV2 options = TfLiteGpuDelegateOptionsV2Default();

    options.inference_priority1 = TFLITE_GPU_INFERENCE_PRIORITY_MIN_LATENCY;
    options.inference_priority2 = TFLITE_GPU_INFERENCE_PRIORITY_MIN_MEMORY_USAGE;
    options.inference_priority3 = TFLITE_GPU_INFERENCE_PRIORITY_MAX_PRECISION;
    options.experimental_flags |= TFLITE_GPU_EXPERIMENTAL_FLAGS_ENABLE_QUANT;
    options.inference_preference = TFLITE_GPU_INFERENCE_PREFERENCE_FAST_SINGLE_ANSWER;
    options.experimental_flags |= TFLITE_GPU_EXPERIMENTAL_FLAGS_CL_ONLY;

    auto theGpuDelegate = tflite::Interpreter::TfLiteDelegatePtr(TfLiteGpuDelegateV2Create(&options), TfLiteGpuDelegateV2Delete);

    tflite::ops::builtin::BuiltinOpResolver resolver;
    tflite::InterpreterBuilder(*m_model.get(), resolver)(&interpreter);

    auto start = chrono::steady_clock::now();
    if(interpreter->ModifyGraphWithDelegate(theGpuDelegate.get()) != kTfLiteOk) throw std::runtime_error(“Fail modify graph with GPU delegate”);
    auto end = chrono::steady_clock::now();

    if (interpreter->AllocateTensors() != kTfLiteOk) throw std::runtime_error(“Fail to allocate tensors”);

    cout << " Init time in milliseconds: "
    << chrono::duration_castchrono::milliseconds(end - start).count()
    << " ms" << endl;
    };

Thank you.

@Rung_Ga,

Welcome to the Tensorflow Forum!

Can you try with one flag MIN_LATENCY to avoid optimization conflict

options.inference_priority1 = TFLITE_GPU_INFERENCE_PRIORITY_MIN_LATENCY;

Thank you!

Hi @chunduriv,
Thank you !