1D CNN data, model and predictions questions

Dan_Fiscus · June 30, 2021, 5:48pm

Hello,

I want to run a 1D CNN on some time series data and have questions at several steps in the process. I include code and info on the data below, and am seeking any assistance with understanding how to set the shape of the input training data and the test data.

I also seek info on how to run the model and predict classes using the test data with 10 examples of each of the 10 classes and look at the predicted classes.

The code below runs, but it does not seem to work like I would expect, and I cannot interpret the results of predictions to tell if I have it configured property.

I have questions inserted at several steps below.

Any suggestions greatly appreciated.

First, some info on my data:

dftrainin and dftestin are the train and test data that come from .CSV files with

column 1 = sampleID (text) to identify the examples / rows

columns 2 to 128 = time series data (floating point numbers, scaled 0 to 100)

column 129 = labels with column name ch_id

the 10 class labels (ch_id) are coded 0 to 9

each of the time series examples (rows) belong to one of 10 classes (ch_id)

in the training data there are 200 rows which are 20 examples each of 10 classes

in the test data there are 100 rows which are 10 examples each of 10 classes

after import many tf, keras, and other libraries / utils

work on copy of input data

dftrain = dftrainin
dftest = dftestin

pop off the labels

train_labels = dftrain.pop(‘ch_id’)
test_labels = dftest.pop(‘ch_id’)

questions about the shape of input training data and the test data?

how to ID and format the 20 example rows for each of the 10 classes?

how to ID and format the 10 example rows for each of the 10 classes?

try using 3rd dim as classes

dftrain_rs = tf.reshape(dftrain, [20, 127, 10])

dftest_rs = tf.reshape(dftest, [10, 127, 10])

one hot encode the labels

train_hot = np_utils.to_categorical(train_labels)
test_hot = np_utils.to_categorical(test_labels)

set up the model

this section runs without error

num_classes = 10

model = Sequential([
layers.Conv1D(filters=64, kernel_size=8, activation=‘relu’, input_shape=(127, 10)),
layers.Conv1D(filters=64, kernel_size=8, activation=‘relu’),
layers.Dropout(0.5),
layers.MaxPooling1D(pool_size=2),
layers.Flatten(),
layers.Dense(96, activation=‘relu’),
layers.Dense(num_classes, activation=‘softmax’)
])

compile the model

this section runs without error

model.compile(loss=‘categorical_crossentropy’, optimizer=‘adam’, metrics=[‘accuracy’])

train the model

use 20% for validation via argument passed to model.fit()

this section runs without error, but the accuracy goes to 1 after 2 epochs

which seems too fast to get to 100% accuracy

epochs=20

history = model.fit(
dftrain_rs, train_hot,
validation_split=0.2,
epochs=epochs
)

Visualize training results

Create plots of loss and accuracy on the training and validation sets.

this section runs without errors, but the graphs don’t look like typical

training curves

acc = history.history[‘accuracy’]
val_acc = history.history[‘val_accuracy’]
loss = history.history[‘loss’]
val_loss = history.history[‘val_loss’]
epochs_range = range(epochs)

plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label=‘Training Accuracy’)
plt.plot(epochs_range, val_acc, label=‘Validation Accuracy’)
plt.legend(loc=‘lower right’)
plt.title(‘Training and Validation Accuracy’)
plt.subplot(1, 2, 2)

plt.plot(epochs_range, loss, label=‘Training Loss’)
plt.plot(epochs_range, val_loss, label=‘Validation Loss’)
plt.legend(loc=‘upper right’)
plt.title(‘Training and Validation Loss’)
plt.show()

run the model for the test data

how to test with 10 examples of each of 10 classes and get predicted classes?

test_predictions = model.predict(dftest_rs).flatten()

test_scores = tf.nn.softmax(test_predictions)

how to see the predicted classes?

print(test_scores)

print(np.argmax(test_scores))

Tanya · March 3, 2024, 10:27pm

@Dan_Fiscus
Welcome to the Tensorflow Forum!

Here’s how to set the shape of your input training and test data for a 1D CNN with time series data:

Define the Feature Vector Length:- The first dimension represents the number of time steps in your time series data. This is the length of your feature vector. Analyze your data to determine the appropriate length for capturing relevant patterns. It could be the entire sequence length, a fixed window size, or a dynamic window size based on specific features.

Consider the Number of Channels :- While many time series datasets have only one channel (e.g., temperature readings), some might have multiple channels representing various sensor readings. If your data has multiple channels, the second dimension would represent the number of channels.

Model Building:- Define your 1D CNN architecture using libraries like TensorFlow or Keras. Ensure the input layer shape matches your training data shape (e.g., (None, 100, 1) where None represents the batch size).

Model Training:- Train your model on the prepared training data.
Prediction: Use the trained model’s predict function on your test data.

Interpreting Predictions:- The predictions will likely be an array of shape (number_of_samples, number_of_classes). Each element in the predictions array represents the probability of the corresponding class for that particular sample.
You can then choose a threshold (e.g., 0.5) to classify each sample based on the highest probability class.

Le tus know if this helps!