Custom audio classification with Teachable machine

Vincent_Ogloblinsky · October 26, 2021, 5:08pm

Hi all,

I am starting a new project and want to build a custom audio classifier using the great website Teachable machine and his speech-commands model.

I have collected audio samples from a personal user interface, and now i have many wav files available.

My goal is to upload them inside teachable machine website, following the audio zip format, which is just a zip file containing :

all the samples sound files for a class concatenated inside a single webm file
a json file explaining for each sound sample his audio characteristics.

Using a simple Node.js script file, i achieve to concatenate my files, and creating the json file.

The last missing information foreach file is an attribute called “frequencyFrames”. For a sample i have downloaded from TM (and created online using microphone), it is an array of array.

Is someone knows how can i get this information for each wav file ?

Thanks

Regards

lgusm · October 28, 2021, 10:05am

Hi Vincent,

can you share the link where you read those instructions?
I think that frequencyFrames might be the framerate of the audio files. This is defined when you record them (eg: 16kHz, 24kHz, …)

Vincent_Ogloblinsky · October 28, 2021, 10:26am

The test project is linked to my Google drive, i cannot share it on the web.

And when i export it, i have in the zip file, in the samples.json file, an array for the 8 samples for the first class “le”. And for each sample, the frequencyFrames is an array of array like in the screenshot below :

[
{
    "frequencyFrames": [
        [
            -100.85028076171875, -95.83499908447266, -88.74774169921875,
            -88.04293823242188, -82.76179504394531, -78.17805480957031,
            -78.35382080078125, -84.32926940917969, -90.46082305908203,
            -87.03742980957031, -81.06280517578125, -71.03707122802734,
            -70.29259490966797, -79.76622009277344, -84.61842346191406,
            -70.43887329101562, -65.5788803100586, -68.34820556640625

May be it is the data used inside spectrograms in the app ?

lgusm · November 3, 2021, 2:25pm

sorry for the delay

but are you following a tutorial? which one? just so I can understand the steps you are doing

Vincent_Ogloblinsky · November 3, 2021, 4:55pm

Not really. I am trying to use Teachable Machine interface with wav files i have collected for a side project, and training a model with these sounds.
There is only one way to import sounds in Teachable machine : previous sounds recorded with the interface and downloaded on my computer for example.
I have tried (explained in my first message) to reproduce the structure of the zip file i got when i download a Teachable machine class.
I have found some interesting classes in tfjs speech commands source code (tfjs-models/speech-commands/src at master · tensorflow/tfjs-models · GitHub), but didn’t find a way to recreate frequencyFrames data.
I have all my wav files offline, and a Node.js script to loop on them and recreate the needed Teachable machine zip file.

lgusm · November 26, 2021, 12:30pm

Hi Vincent, sorry for the delay but got some clarification for you:

The zip file that is generated includes the audio files solely for the purposes of playback, but the FFT data has already been previously extracted, this is the missing “frequencyFrames”.

heres a comment from the code that is processing it:

/**
 * The number of frames of frequency data to represent one sample,
 * for speech-commands this is 43, it corresponds to the models input shape
 * speech-commands input shape is [null, 43, 232, 1]
 */

Each “sample” is 43 frames consisting of the first 232 numbers in the FFTs array. Though another important detail is these numbers come from the WebAudio API’s Analyser and using something else is likely to shift performance.

My suggestion is, if you want to do with your own audio, it might be simple to:
1 - change their code (it’s open source) and enable a better input
2- try using one of the samples here:

Vincent_Ogloblinsky · January 4, 2022, 11:19pm

Hi,
Thanks for taking time getting information in the source code of Teachable machine.
I achieve to do what i want with just a few lines of JS code.

let c = new AudioContext({
    sampleRate: 44100,
});
let b = await fetch(`${$input.value}.wav`)
    .then((response) => response.arrayBuffer())
    .then((buffer) => c.decodeAudioData(buffer));

let freqDataQueue = [];
let columnTruncateLength = 232;
let sampleRate = 44100;

let oac = new OfflineAudioContext({
    numberOfChannels: b.numberOfChannels,
    length: b.length,
    sampleRate: sampleRate,
});

const source = oac.createBufferSource();
const processor = oac.createScriptProcessor(1024, 1, 1);

const analyser = oac.createAnalyser();
analyser.fftSize = 2048;
analyser.smoothingTimeConstant = 0;

source.buffer = b;

source.connect(analyser);
analyser.connect(processor);
processor.connect(oac.destination);

var freqData = new Float32Array(analyser.fftSize);
processor.onaudioprocess = () => {
    analyser.getFloatFrequencyData(freqData);
    freqDataQueue.push(freqData.slice(0, columnTruncateLength));
};

source.start(0);
oac.startRendering();

oac.oncomplete = (e) => {
    console.log(freqDataQueue);
    source.disconnect(analyser);
    processor.disconnect(oac.destination);
};

lgusm · January 17, 2022, 7:15pm

That’s great! I’m glad you could find a work around!

Maxwell_John · April 4, 2022, 9:37am

Hi, I am Maxwell
I am currently working on ML audio project, I got struck in same problem Teachable machine is not allowing me to upload my own audio dataset. I have referred your code, I want to clarify that your code will work or not.
Thank you