Audio classification: Getting train_function error when trying to fit model

The title describes the gist of the issue, details and code are here:

Does anyone know why this error is happening? I cannot for the life of me figure it out.


Hi @Guy_Berreby, what type of music data are you working on? For instance, if it’s in a MIDI format, then I can see why you’re using the LSTM architecture. I also noticed you’re using (  |  TensorFlow I/O) - maybe your two datasets are waveform-based. Can you please share some info and how you’re loading the data (code)?

I’ve summarized your code and the task below with some formatting, based on the information in the StackOverflow post you shared. @Guy_Berreby do let me know if the spaces and other info are correct, I had to make some minor adjustments:

Your ML task

  • Music genre classification, two different genres - bm and dm :guitar:


  • RNN (LSTM) model:
model = keras.Sequential()
# Add an Embedding layer expecting input vocab of size 1000, and
# output embedding dimension of size 64.
model.add(layers.Embedding(input_dim=maxLen, output_dim=2,mask_zero=True))
# Add an LSTM layer with 128 internal units.
# model.add(layers.Input(shape=[1,None]) )
model.add(layers.Dropout(0.2) )

model.add(layers.Dropout(0.2) )

# Add a Dense layer with 10 units.

model.compile(loss='categorical_crossentropy', optimizer='adam')
  • Generator:
def modelTrainGen(maxLen):
  # One type of music - training set
  bmTrainDirectory = '/content/drive/.../...train/'
  # Another type of music - training set
  dmTrainDirectory = '/content/drive/.../...train/'

  dmTrainFileNames = os.listdir(dmTrainDirectory)
  bmTrainFileNames = os.listdir(bmTrainDirectory)
  maxAudioLen = maxLen
  bmTensor = tf.convert_to_tensor([[1],[0]])
  dmTensor = tf.convert_to_tensor([[0],[1]])
  allFileNames = []

  for fileName in zip(bmTrainFileNames,dmTrainFileNames):
    bmFileName = fileName[0]
    dmFileName = fileName[1]

    for fileNameVal in allFileNames: 

    fileName = fileNameVal[0]
    val = fileNameVal[1]

    if val == 1:
      bmFileName = fileName
      audio = + bmFileName)
      audio_slice = tf.reduce_max(tf.transpose(audio[0:]),0)
      del audio
      padded_x = tf.keras.preprocessing.sequence.pad_sequences( [audio_slice], padding="post", 
      dtype=float,maxlen=maxAudioLen )
      del audio_slice
      converted = tf.convert_to_tensor(padded_x[0])
      del padded_x
      yield ( converted,bmTensor)
      del converted
      dmFileName = fileName 
      audio = + dmFileName)
      audio_slice = tf.reduce_max(tf.transpose(audio[0:]),0)
      del audio
      padded_x = tf.keras.preprocessing.sequence.pad_sequences( [audio_slice], padding="post", dtype=float,maxlen=maxAudioLen)
      del audio_slice
      converted = tf.convert_to_tensor(padded_x[0])
      del padded_x
      yield ( converted,dmTensor)
      del converted

(The following TensorFlow docs are for waveform-based data - could be useful in future:


Hi, thanks for the response! The data is mp3 files of songs, not midi files, which as you noticed I am loading in


Hi @Guy_Berreby

Spotify recently open-sourced a TensorFlow 2 library which you may find helpful

Realbook is a Python library for easier training of audio deep learning models with Tensorflow made by Spotify’s Spotify’s Audio Intelligence Lab. Realbook provides callbacks (e.g., spectrogram visualization) and well-tested Keras layers (e.g., STFT, ISTFT, magnitude spectrogram) that we often use when training. These functions have helped standardized consistency across all of our models we and hope realbook will do the same for the open source community.

Includes Keras layers:

  • FrozenGraphLayer - Allows you to use a TF V1 graph as a Keras layer.
  • CQT - Constant-Q transform layers ported from nnAudio.
  • Stft, Istft, MelSpectrogram, Spectrogram, Magnitude, Phase and MagnitudeToDecibel - Layers that perform common audio feature preprocessing. All checked for correctness against librosa.

Realbook contains a number of layers that convert audio data (i.e.: waveforms) into various spectral representations (i.e.: spectrograms). For convenience, the amount of memory required for the most commonly used layers is provided below.