Prepare .wav file for yamnet.tflite model

Hi developers?

How to prepare .wav and .amr file for yamnet.tflite model in kotlin or java. I have checked the example project on Github but it has only real-time classification using the mic, but I need to know how to prepare the wav and amr file for this model. thanks

Hi @Rufan_Khokhar

Take a look at this article where there is an explanation of android usage of Yamnet model. Also at the end there is a github link. I hope you find it useful.


1 Like

Sir, Thanks for your answer. I also tried your provided source but it’s also using only a mic and is very hard to understand, I’m using this project as an example.

This project using the * TFLite Task Library is very easy.

I’m using this code to prepare the wav file, please check out this

object AudioConverter {
fun readAudioSimple(path: File): FloatArray {
    val input =
    val buff = ByteArray(path.length().toInt())
    val dis = DataInputStream(input)
    // remove wav header at first 44 bytes
    return floatMe(shortMe(buff.sliceArray(buff.indices)) ?: ShortArray(0)) ?: FloatArray(

fun FloatArray.sliceTo(step: Int): List<FloatArray> {
    val slicedAudio = arrayListOf<FloatArray>()
    var startAt = 0
    var endAt = 15600
    val stepSize = if (step != 0) (15600 * (1f / (2 * step))).toInt() else 0
    while ((startAt + 15600) < this.size) {
        if (startAt != 0) {
            startAt = endAt - stepSize
            endAt = startAt + 15600
        slicedAudio.add(this.copyOfRange(startAt, endAt))
        startAt = endAt
    return slicedAudio

private fun shortMe(bytes: ByteArray): ShortArray {
    val out = ShortArray(bytes.size / 2)
    return out

private fun floatMe(pcms: ShortArray): FloatArray {
    val floats = FloatArray(pcms.size)
    pcms.forEachIndexed { index, sh ->
        // The input must be normalized to floats between -1 and +1.
        // To normalize it, we just need to divide all the values by 2**16 or in our code, MAX_ABS_INT16 = 32768
        floats[index] = sh.toFloat() / 32768.0f

    return floats

I’m Student, please help me. I really need this solution.


  1. Put your file inside assets folder.
  2. Create an input stream:
    java - InputStream from Assets folder on Android returning empty - Stack Overflow
  3. Create a list of shorts like the accepted answer here where it uses input stream and guava:
    java - Mix two files audio wav on android use short array - Stack Overflow
  4. If you do not have quava insert the dependency as here:
    GitHub - google/guava: Google core libraries for Java
  5. Having the short array list create a float array and continue from this line inside my project and see what I have done next:
    Yamnet_classification_project/ListeningFragmentViewmodel.kt at master · farmaker47/Yamnet_classification_project · GitHub

So bacically the idea is to convert the .wav file to list of shorts then floatarray and then feed the interpreter.

I hope my post helps you.


1 Like

Hello sir, Thanks for your solution,

The above solution only works with specific wave files( that matches the model input specifications like byte rate and channel). My question is how to process wave files that do not match the required input specification. How I can input this file. I have tried so many codes and libraries but lost.

Please help me with this.


Check a little bit the specifications of the Yamnet model to see if there is an alternative for inputs:

If there is no alternative you have to convert your wav files to the correct format.


Hello sir, I hope you’re well, I found the solution for yamnet model, and write the article on Medium,

Please check it out and give me suggestions to improve it.

Nice work @Rufan_Khokhar !

I read your article. I think you can explain a little bit more about the library FFmpegKit…provide some links for the user so they can decide to use it or not. The issue with custom libraries is that someday the authors stop supporting them and they do not work with future android APIs.
I see that you are using TensorFlow AudioClassifier… Have you tried directly the conversion the library provides?


And I also facing the same problem with ESRGAN (Image super-resolution) model, the model accepts 50X50 image size and outputs 200X200.

My question is how to train the model for custom input, like 150X150 or 240X240.

here is the link.

Hello sir, how to resample wav audio from 16000hz to 8000hz, because i need preprocess the audio to classify with my tflite model, in the jupyter notebook i use librosa before predict the audio, how i do that in android? i try with your medium post and change the execute parameter of FFmpegKit from 16000 to 8000, but i think it didnt work well, is there any solution?