Audio classification: Getting train_function error when trying to fit model

Guy_Berreby · May 20, 2021, 5:01am

The title describes the gist of the issue, details and code are here: https://stackoverflow.com/questions/67412221/getting-function-call-stack-train-function-train-function-error-when-train

Does anyone know why this error is happening? I cannot for the life of me figure it out.

8bitmp3 · May 20, 2021, 3:58pm

Hi @Guy_Berreby, what type of music data are you working on? For instance, if it’s in a MIDI format, then I can see why you’re using the LSTM architecture. I also noticed you’re using tfio.audio.AudioIOTensor ( tfio.audio.AudioIOTensor | TensorFlow I/O) - maybe your two datasets are waveform-based. Can you please share some info and how you’re loading the data (code)?

I’ve summarized your code and the task below with some formatting, based on the information in the StackOverflow post you shared. @Guy_Berreby do let me know if the spaces and other info are correct, I had to make some minor adjustments:

Your ML task

Music genre classification, two different genres - bm and dm

Code

RNN (LSTM) model:

model = keras.Sequential()
# Add an Embedding layer expecting input vocab of size 1000, and
# output embedding dimension of size 64.
model.add(layers.Embedding(input_dim=maxLen, output_dim=2,mask_zero=True))
#model.add(layers.Masking())
# Add an LSTM layer with 128 internal units.
# model.add(layers.Input(shape=[1,None]) )
model.add(layers.LSTM(8,return_sequences=True))
model.add(layers.Dropout(0.2) )

model.add(layers.LSTM(8))
model.add(layers.Dropout(0.2) )

# Add a Dense layer with 10 units.
model.add(layers.Dense(16,activation="relu"))
model.add(layers.Dropout(0.2))

model.add(layers.Dense(2,activation="softmax"))
model.compile(loss='categorical_crossentropy', optimizer='adam')

Generator:

def modelTrainGen(maxLen):
  # One type of music - training set
  bmTrainDirectory = '/content/drive/.../...train/'
  # Another type of music - training set
  dmTrainDirectory = '/content/drive/.../...train/'

  dmTrainFileNames = os.listdir(dmTrainDirectory)
  bmTrainFileNames = os.listdir(bmTrainDirectory)
  maxAudioLen = maxLen
  bmTensor = tf.convert_to_tensor([[1],[0]])
  dmTensor = tf.convert_to_tensor([[0],[1]])
  allFileNames = []

  for fileName in zip(bmTrainFileNames,dmTrainFileNames):
    bmFileName = fileName[0]
    dmFileName = fileName[1]
    allFileNames.append((bmFileName,1))
    allFileNames.append((dmFileName,0))


    random.shuffle(allFileNames)
    for fileNameVal in allFileNames: 

    fileName = fileNameVal[0]
    val = fileNameVal[1]

    if val == 1:
      bmFileName = fileName
      audio = tfio.audio.AudioIOTensor(bmTrainDirectory + bmFileName)
      audio_slice = tf.reduce_max(tf.transpose(audio[0:]),0)
      del audio
      print(audio_slice.shape)
      padded_x = tf.keras.preprocessing.sequence.pad_sequences( [audio_slice], padding="post", 
      dtype=float,maxlen=maxAudioLen )
      del audio_slice
      converted = tf.convert_to_tensor(padded_x[0])
      del padded_x
      print("A")
      print(converted.shape)
      yield ( converted,bmTensor)
      print("B")
      del converted
        
    else:
      dmFileName = fileName 
      audio = tfio.audio.AudioIOTensor(dmTrainDirectory + dmFileName)
      audio_slice = tf.reduce_max(tf.transpose(audio[0:]),0)
      del audio
      print(audio_slice.shape)
      padded_x = tf.keras.preprocessing.sequence.pad_sequences( [audio_slice], padding="post", dtype=float,maxlen=maxAudioLen)
      del audio_slice
      converted = tf.convert_to_tensor(padded_x[0])
      del padded_x
      print("C")
      print(converted.shape)
      yield ( converted,dmTensor)
      print("D")
      del converted

(The following TensorFlow docs are for waveform-based data - could be useful in future:

Guy_Berreby · May 21, 2021, 2:50am

Hi, thanks for the response! The data is mp3 files of songs, not midi files, which as you noticed I am loading in tfio.audio.AudioIOTensor

8bitmp3 · December 28, 2022, 9:25pm

Hi @Guy_Berreby

Spotify recently open-sourced a TensorFlow 2 library which you may find helpful

Realbook is a Python library for easier training of audio deep learning models with Tensorflow made by Spotify’s Spotify’s Audio Intelligence Lab. Realbook provides callbacks (e.g., spectrogram visualization) and well-tested Keras layers (e.g., STFT, ISTFT, magnitude spectrogram) that we often use when training. These functions have helped standardized consistency across all of our models we and hope realbook will do the same for the open source community.

Includes Keras layers:

FrozenGraphLayer - Allows you to use a TF V1 graph as a Keras layer.

CQT - Constant-Q transform layers ported from nnAudio.

Stft, Istft, MelSpectrogram, Spectrogram, Magnitude, Phase and MagnitudeToDecibel - Layers that perform common audio feature preprocessing. All checked for correctness against librosa.

Realbook contains a number of layers that convert audio data (i.e.: waveforms) into various spectral representations (i.e.: spectrograms). For convenience, the amount of memory required for the most commonly used layers is provided below.