I’ve seen this article from TensorFlow https://www.tensorflow.org/tutorials/keras/classification
Which does a great job explaining the details in configuring a neural network to classify 10 different labels / classes from the fashion MNIST dataset - this inspired me to design a neural network for music classification.
With the code underneath I want to feed an algorithm with two types of folders that contain two different types of music genres, then create a spectrogram for each of those audio-files, and those spectrogram-images would then be used to train the neural network, just like in the Keras classification example above. So instead of using images of 10 different fashion articles, I am using images of two different types of spectrograms. The only difference is that I want to design my neural network totally linear, so no additional relu-activated dense-layer in the middle. To keep things simple I started with just two folders, so it is a classification task to differ between just two musical genres at the moment, but my goal would be to add more genres later.
import numpy as np import librosa import librosa.display import datetime import math import os import tensorflow as tf from pathlib import Path # Spektrogram def prepare_song(song_path): list_matrices =  y,sr = librosa.load(song_path,sr=22050,duration=10) D = np.abs(librosa.stft(y))**2 S = librosa.feature.melspectrogram(S=D, sr=sr) list_matrices.append(S) return list_matrices audio_tracks =  genre =  #Genre 1 path = '/Users/Laulito/Desktop/Samplepack der Genres/House' pathlist = Path(path).glob('**/*.wav') for path in pathlist: path_in_str = str(path) song_pieces = prepare_song(path_in_str) audio_tracks += song_pieces genre += (*len(song_pieces)) # puts zeros into target / train--labels array #Genre 2 path2 = '/Users/Laulito/Desktop/Samplepack der Genres/Drum & Bass' pathlist2 = Path(path2).glob('**/*.wav') for path2 in pathlist2: path_in_str2 = str(path2) song_pieces = prepare_song(path_in_str2) audio_tracks += song_pieces genre += (*len(song_pieces)) # puts ones into target / train-labels array # Initialise from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(np.array(audio_tracks), np.array(genre), test_size=0.2, train_size=0.8, random_state=42) X_val, X_test, y_val, y_test = train_test_split(X_test, y_test, test_size=0.5, random_state=42) # Linear Model from keras import datasets, layers, models model = tf.keras.Sequential([ tf.keras.layers.Flatten(input_shape=(128, 440)), # 128x440 is the size of a spectrogram-image tf.keras.layers.Dense(2) #Dense(2) because there are just two genres ]) model.summary() lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay( initial_learning_rate=0.2, decay_steps=15, decay_rate=0.9) model.compile(optimizer = tf.keras.optimizers.SGD(learning_rate=lr_schedule), loss=tf.keras.losses.MeanSquaredError(), metrics=[tf.keras.metrics.Accuracy()]) model.fit(x=X_train, y=y_train, epochs=5, validation_split=0.2) model.evaluate(x=X_test, y=y_test)
That Code was bugged and stopped me at line
model.fit(), telling me in the terminal that shape (none, 1) and shape (none, 2) would be incompatible. I guess it has something to do with the last dense-layer
tf.keras.layers.Dense(2), creating a shape of (none, 2), but the shape of my label-array was (none, 1). Which surprised me because the target in the Keras example above was also one-dimensional and the last dense-layer was of dimension 10, so their shapes would have been (none, 10) and (none, 1) …
Anyway I modified the code as follows:
a = 0 b = 1 #Genre 1 path = '/Users/Laulito/Desktop/Samplepack der Genres/House' pathlist = Path(path).glob('**/*.wav') for path in pathlist: path_in_str = str(path) song_pieces = prepare_song(path_in_str) audio_tracks += song_pieces array = [a,b] genre += ([array]*len(song_pieces)) #Genre 2 path2 = '/Users/Laulito/Desktop/Samplepack der Genres/Drum & Bass' pathlist2 = Path(path2).glob('**/*.wav') for path2 in pathlist2: path_in_str2 = str(path2) song_pieces = prepare_song(path_in_str2) audio_tracks += song_pieces array = [b,a] genre += ([array]*len(song_pieces))
With this change I at least now got the code running, because now the shape of genre is (none, 2) as well, but it resulted in a model where the loss was “nan” and the accuracy was 0 … I might have messed up something along the way … maybe someone can help me figure out were i went wrong