Prepare .wav file for yamnet.tflite model

Hi developers?

How to prepare .wav and .amr file for yamnet.tflite model in kotlin or java. I have checked the example project on Github but it has only real-time classification using the mic, but I need to know how to prepare the wav and amr file for this model. thanks

Hi @Rufan_Khokhar

Take a look at this article where there is an explanation of android usage of Yamnet model. Also at the end there is a github link. I hope you find it useful.

Best

1 Like

Sir, Thanks for your answer. I also tried your provided source but it’s also using only a mic and is very hard to understand, I’m using this project as an example.

This project using the * TFLite Task Library is very easy.

I’m using this code to prepare the wav file, please check out this

object AudioConverter {
fun readAudioSimple(path: File): FloatArray {
    val input =
        BufferedInputStream(FileInputStream(path))
    val buff = ByteArray(path.length().toInt())
    val dis = DataInputStream(input)
    dis.readFully(buff)
    // remove wav header at first 44 bytes
    return floatMe(shortMe(buff.sliceArray(buff.indices)) ?: ShortArray(0)) ?: FloatArray(
        0
    )
}

fun FloatArray.sliceTo(step: Int): List<FloatArray> {
    val slicedAudio = arrayListOf<FloatArray>()
    var startAt = 0
    var endAt = 15600
    val stepSize = if (step != 0) (15600 * (1f / (2 * step))).toInt() else 0
    while ((startAt + 15600) < this.size) {
        if (startAt != 0) {
            startAt = endAt - stepSize
            endAt = startAt + 15600
        }
        slicedAudio.add(this.copyOfRange(startAt, endAt))
        startAt = endAt
    }
    return slicedAudio
}

private fun shortMe(bytes: ByteArray): ShortArray {
    val out = ShortArray(bytes.size / 2)
    ByteBuffer.wrap(bytes).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().get(out)
    return out
}

private fun floatMe(pcms: ShortArray): FloatArray {
    val floats = FloatArray(pcms.size)
    pcms.forEachIndexed { index, sh ->
        // The input must be normalized to floats between -1 and +1.
        // To normalize it, we just need to divide all the values by 2**16 or in our code, MAX_ABS_INT16 = 32768
        floats[index] = sh.toFloat() / 32768.0f

    }
    return floats
}
}

I’m Student, please help me. I really need this solution.

Hi,

  1. Put your file inside assets folder.
  2. Create an input stream:
    java - InputStream from Assets folder on Android returning empty - Stack Overflow
  3. Create a list of shorts like the accepted answer here where it uses input stream and guava:
    java - Mix two files audio wav on android use short array - Stack Overflow
  4. If you do not have quava insert the dependency as here:
    https://github.com/google/guava
  5. Having the short array list create a float array and continue from this line inside my project and see what I have done next:
    https://github.com/farmaker47/Yamnet_classification_project/blob/master/app/src/main/java/com/soloupis/yamnet_classification_project/viewmodel/ListeningFragmentViewmodel.kt#L88

So bacically the idea is to convert the .wav file to list of shorts then floatarray and then feed the interpreter.

I hope my post helps you.

Best

1 Like

Hello sir, Thanks for your solution,

The above solution only works with specific wave files( that matches the model input specifications like byte rate and channel). My question is how to process wave files that do not match the required input specification. How I can input this file. I have tried so many codes and libraries but lost.

Please help me with this.

Thanks.

Check a little bit the specifications of the Yamnet model to see if there is an alternative for inputs:

If there is no alternative you have to convert your wav files to the correct format.

Best