Tensorflow Lite model with select ops in Android studio app (kotlin)

Hi:) I am trying to load a tflite model in my android app, but I get these two error messages:


This is how I converted the model:

tf_model = tf.saved_model.load('Models/MoViNet/models/movinet_freez10_3')
converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
converter.target_spec.supported_ops = [
  tf.lite.OpsSet.TFLITE_BUILTINS,
  tf.lite.OpsSet.SELECT_TF_OPS
]
tflite_model = converter.convert()
open('Models/MoViNet/lite/model.tflite', 'wb').write(tflite_model)

This is how I try to load my model when it is located in assets folder:

    try {
        val tfliteModel = FileUtil.loadMappedFile(context, "modelCopy.tflite")
        val tflite = Interpreter(tfliteModel)
    } catch (e: IOException) {
        Log.e("tfliteSupport", "Error reading model", e)
    }

I have also tried this where the model is in a ml package (got the same error):

import com.example.slr.ml.Model
val model = Model.newInstance(context)

And I have these dependencies in my gradle file:

implementation("org.tensorflow:tensorflow-lite:0.0.0-nightly-SNAPSHOT")
// This dependency adds the necessary TF op support.
implementation("org.tensorflow:tensorflow-lite-select-tf-ops:0.0.0-nightly-SNAPSHOT")
implementation("org.tensorflow:tensorflow-lite-support:0.0.0-nightly-SNAPSHOT")
implementation("org.tensorflow:tensorflow-lite-metadata:0.0.0-nightly-SNAPSHOT")

Do any of you have an idea how to include the AvgPool3D operation or another way to fix this issue? (it is listed in supported core ops here)

Hello, I have also been struggling with MoViNet for quite a while now and haven’t gotten as far as you could. Is there anyway I could contact you for a few questions on how you trained the model and converted it to tflite?

Hi @I_M

Can you check with netron app to see the inputs and outputs of your .tflite model?
Can you paste here the result?

Regards

The model is very long so I took a screenshot of the input and output description

This is the whole model

So, I guess from the netron result I see that your .tflite file is generated that way so your input is just 1 with dimensions 1,1,1,1,3 . I think this is wrong don’t you? Probably there is an issue during the conversion. You have to check again the inputs/outputs of the Movinet model and check that the same are after conversion.
You can do the same with netron and the Movinet saved model (use the .zip or the .pb file)

I agree that the input dimensions look wronf, it is supposed to be [batch_size, frames, resolution, resolution, 3] not just 1.

I tried to look at the saved model using the .pb file (did not work with zip), but the .pb file looked very different than the tflite model and did not really make sense to me.
This is a scrrenshot of the far left and it looks quite similar all the way to the end.

Also this is how I saved the model, is it something wrong with this perhaps?

input_shape = (batch_size, frames, resolution, resolution, 3)
input_shape_concrete = [1 if s is None else s for s in input_shape]
print(input_shape_concrete)
model.build(input_shape_concrete)
 _ = model(tf.ones(input_shape_concrete))
 tf.saved_model.save(model, f'Models/MoViNet/models/{name}')

I see!

That leaves us with no other option than for you to create a colab notebook and share it here. Somewhere there is an error in the process. If it is not under NDA copy paste a link of the notebook with the code to take a look.

Regards

Okay, here is the colab.
Let me know if you also need to see the code for processing the input videos.
Thanks so much for trying to help me!!

Since I cannot run and debug because the colab does not contain links for the model, the only suggestion I can make right now is to test the model with an input before converting.
I think the error is somewhere here:

input_shape = (batch_size, frames, resolution, resolution, 3)
input_shape_concrete = [1 if s is None else s for s in input_shape]
print(input_shape_concrete)
model.build(input_shape_concrete)

_ = model(tf.ones(input_shape_concrete))
tf.saved_model.save(model, f'Models/MoViNet/models/{name}')

So before saving it do an inference with an image to see if the result is OK. Then adjust the code and save it before you convert it.

Ping me when you have updates.

So I did what you said and the inferenc where the same before and after saving/load the saved model. I therefore googled around some more and finally managed to get the right input shape for my tflite model by using this code for saving and converting:

input_shape = [1, frames, resolution, resolution, 3]
export_saved_model.export_saved_model(model=model, input_shape=input_shape, export_path=f'Models/MoViNet/models/{name}')


model = tf.saved_model.load(f"Models/MoViNet/models/{name}") 
concrete_func = model.signatures[ tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY] 
concrete_func.inputs[0].set_shape([1, frames, resolution, resolution, 3])
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func]) 
converter.target_spec.supported_ops = [
  tf.lite.OpsSet.TFLITE_BUILTINS,
  tf.lite.OpsSet.SELECT_TF_OPS
]
tflite_model = converter.convert()

It currently looks like this in the start:
Skjermbilde 2024-04-05 kl. 10.36.06

However I am now back to struggling with android studio. Do you have any tips for how to run my model there, more specifically how I can get the right input format? I can’t seem to find code examples where the input is videos.

So far this is what I have, I convert the video to a list of frames that are Bitmaps. Also can’t decide if I should use the Interpreter API or the other method that I have started with.

fun predict(context: Context, mmr: MediaMetadataRetriever): Pair<List<Float>, Long> {

    val frames = videoFrames(mmr)
    
    // Option 1: One way to use the model, but unsure how to create the input
    val model = MovinetA1Base04052024091340.newInstance(context)
    val input = TensorBuffer.createFixedSize(intArrayOf(1,4), DataType.FLOAT32) // not sure about this

    // Option 2: interpreter
    try {
        val tfliteModel = FileUtil.loadMappedFile(context, "movinet_a1_base_04042024_170320Copy.tflite")
        val tflite = Interpreter(tfliteModel)
    } catch (e: IOException) {
        Log.e("tfliteSupport", "Error reading model", e)
    }
    
    // Unrelated to the options
    val startTime = SystemClock.elapsedRealtime()
    
    val output = emptyList<Float>()

    val inferenceTime = SystemClock.elapsedRealtime()-startTime
    return Pair(output, inferenceTime)
}


private fun videoFrames(mmr: MediaMetadataRetriever): List<Bitmap> {
    val frames = mutableListOf<Bitmap>()
    val fps = 2
    var durationMs = 0.0

    val duration = mmr.extractMetadata(MediaMetadataRetriever.METADATA_KEY_DURATION)
    if (duration !=null) {
        durationMs = duration.toDouble()
    }
    val durationInSec = ceil(durationMs/1000).toInt()
    Log.i("Duration", durationInSec.toString())

    for (i in 0 until fps*durationInSec){
        val timeUs = (i*durationMs/(fps*durationInSec)).toInt()
        val bitmap = mmr.getFrameAtTime(timeUs.toLong())
        if (bitmap!=null) {
            val resized = Bitmap.createScaledBitmap(bitmap, 172,172,false)
            frames.add(resized)
            Log.i("Bitmap", "Registered bitmap frame at $timeUs")
        } else{
            Log.i("No bitmap", "Found no bitmap at frame: $timeUs")
        }

    }
    return frames
}

I would go with option #2 and use Interpreter. Check this file where is a method to convert a bitmap to ByteBuffer.
With the ByteBuffer you can feed the Interpreter. Inside the above file there is the BitmapToByteBuffer method which is a standard method to create a [1,width, height, 3] bitmap to bytebuffer. Pay attention to the buffer that uses * 4 which is for floats.

You have to adjust though to include the [frames] parameter your .tflite expects.

Come back if you have more questions.

Thank you, I used the file for converting the bitmaps to bytebuffer, and then later created a tensorbuffer as the final input. However I got a new error when trying to run the model and I find it very hard to understand the errors, do you perhaps have some recommendation for debugging this type of code when I don’t have that much experience in Android Studio?

Here is the error:

And this is the current code:

const val RESOLUTION = 172
const val BATCH_SIZE = 1
const val CHANNELS = 3
const val NUM_FRAMES = 20
fun predict(context: Context, mmr: MediaMetadataRetriever): Pair<FloatArray, Long> {
    val frames = videoFrames(mmr)
    // Define the shape and data type of the TensorBuffer
    val shape = intArrayOf(BATCH_SIZE, frames.size, RESOLUTION, RESOLUTION, CHANNELS)
    // Create an empty TensorBuffer with the desired shape and data type
    val tensorBuffer = TensorBuffer.createFixedSize(shape, DataType.FLOAT32)
    // Calculate the size of a single slice based on the shape of the TensorBuffer
    val sliceSize = tensorBuffer.buffer.limit() / tensorBuffer.shape[1]
    // Iterate over each ByteBuffer in the list and load it into the appropriate slice of the TensorBuffer
    for (i in frames.indices) {
        val byteBuffer = frames[i]
        // Calculate the offset for the current slice
        val offset = i * sliceSize
        // Copy the contents of the ByteBuffer to the appropriate slice of the TensorBuffer
        byteBuffer.position(0)
        tensorBuffer.buffer.position(offset)
        tensorBuffer.buffer.put(byteBuffer)
    }

    val output = TensorBuffer.createFixedSize(intArrayOf(1, 100), DataType.FLOAT32)
    var inferenceTime = 0.toLong()

    try {
        val tfliteModel = FileUtil.loadMappedFile(context, "movinet_a1_base_04052024_091340.tflite")
        val tflite = Interpreter(tfliteModel)
        Log.i("signature keys",tflite.signatureKeys.toString())
        // Inference
        val startTime = SystemClock.elapsedRealtime()
        tflite.run(tensorBuffer.buffer, output.buffer); // This is the line causing the error
        inferenceTime = SystemClock.elapsedRealtime()-startTime
    } catch (e: IOException) {
        Log.e("tfliteSupport", "Error reading model", e)
    }

    return Pair(output.floatArray, inferenceTime)
}

private fun videoFrames(mmr: MediaMetadataRetriever): List<ByteBuffer> {
    val frames = mutableListOf<ByteBuffer>()
    var durationMs = 0.0

    val duration = mmr.extractMetadata(MediaMetadataRetriever.METADATA_KEY_DURATION)
    if (duration !=null) {
        durationMs = duration.toDouble()
    }

    val frameStep = (durationMs/ NUM_FRAMES).toBigDecimal().setScale(2, RoundingMode.DOWN).toDouble()
    for (i in 0 until NUM_FRAMES){
        val timeUs = (i*frameStep)
        var bitmap = mmr.getFrameAtTime(timeUs.toLong())
        if (bitmap!=null) {
            bitmap = bitmap.copy(Bitmap.Config.ARGB_8888, true)
            val resized = Bitmap.createScaledBitmap(bitmap, RESOLUTION, RESOLUTION,false)
            val inputImage = bitmapToByteBuffer(resized, RESOLUTION, RESOLUTION)
            frames.add(inputImage)
            Log.i("Bitmap", "Registered bitmap frame at $timeUs")
        } else{
            Log.i("No bitmap", "Found no bitmap at frame: $timeUs")
        }
    }
    return frames
}

I guess the ByteBuffer you are creating is wrong. You have to check again that.
If this is of extremely difficulty you can feed the interpreter directly with a FloatArray that in your case will be of size [1,20,172,172,3]. That will be kinda slower than feeding with a ByteBuffer but it will give you a head start and an alternative before you see again the ByteBuffer creation.

I created an updated version of the above code snippet:

fun bitmapArrayToByteBuffer(
    bitmaps: Array<Bitmap>,
    width: Int,
    height: Int,
    mean: Float = 0.0f,
    std: Float = 255.0f
): ByteBuffer {
    val totalBytes = bitmaps.size * width * height * 3 * 4 // Check your case for 20 Bitmaps
    val inputImage = ByteBuffer.allocateDirect(totalBytes)
    inputImage.order(ByteOrder.nativeOrder())

    for (bitmap in bitmaps) {
        val scaledBitmap = scaleBitmapAndKeepRatio(bitmap, width, height)
        val intValues = IntArray(width * height)
        scaledBitmap.getPixels(intValues, 0, width, 0, 0, width, height)

        // Normalize and add pixels for each Bitmap
        for (y in 0 until height) {
            for (x in 0 until width) {
                val value = intValues[y * width + x]
                inputImage.putFloat(((value shr 16 and 0xFF) - mean) / std)
                inputImage.putFloat(((value shr 8 and 0xFF) - mean) / std)
                inputImage.putFloat(((value and 0xFF) - mean) / std)
            }
        }

        scaledBitmap.recycle()  // Free memory after processing
    }

    inputImage.rewind()
    return inputImage
}

Check if this fixes your error and you can see if the result is OK.

I tried your code and also a couple of other things, but still got the same error.
So I suspect there is something wrong further down in the tflite model:/

If you suspect something is wrong with your tflite file then you can perform inference first with the TensorFlow Lite Interpreter API in Python. With that you can verify that your model is converted OK.
Then you can jump again inside android.