Design of neural network for signal processing - non-strndard task, preparation of training dataset

twistfire · January 11, 2023, 8:35pm

Hi there!

I need to develop a NN for the classification of signal points and my question is how to prepare training data for that if I have the next data and structures.

I have thousands of signals on a hard drive in csv format, I can use them for training.
csv is read into a separate dataframe for each signal (input instance data). Points can be classified only within dataframe - to classify points you need to know all the signal (One point can’t be classified separately - only full set of data carries the information about possible point classes - e.g. there are probabilities of distributions of points of class 1 along the x-axis).

I have thousands of such separate dataframes - all points of them are labeled.

Each file has about 10 000 rows, and one row holds data for one point of a signal y(x):
i - number of current point,
x - coordinate value for current point i
y - value of a signal at x - y(x)
delta_y - error value for current point x
point_class - for simplification it’s a binary-type variable - 0/1 (label to train and classify)
(0 - for all points that are from class 0, and 1- for another class of points).

So for inputs - it’s possible to pass 30 000 values, and classify each of 10 000 points of input signal defining a class for each point if current signal - so we have 10 000 outputs.

The task for neural network is to get the arrays x[], y[], delta_y[] as input and classify each point of input signal (point_class column).

In this case output of a NN must have the same size as one input signal - 10 000 outputs to characterize 10 000 input points of the signal. All outputs can be interpreted as probability that point has class 1 - value can be 0…1.

So for training I have inputs : x[], y[], delta_y[] - (10 000 elements each) -
(we are giving y(x), delta_y(x), x to the input of neural network)
and it must calculate the outputs: - p[]: [0, 0, 0, 1, ... , 1, 0, 0] - (10 000 elements), so it can be plotted like p(x).

How to handle all the files into one dataframe or it is possible to create a training dataset just using separate input csv-s as and others as outputs - for each training case? (input1.csv → output1.csv)

Maybe someone can provide examples similar to this task?

Which layer types to use in this case as outputs?
Is it a good idea to have such a large network (input number

If any additional information is needed - I can provide anything for understanding and processing.

twistfire · January 12, 2023, 6:07pm

Can anyone suggest some approach?