ValueError: `logits` and `labels` must have the same shape, received ((None, 1) vs (None, 200))

I tried to train a convolutional neural network to predict the labels (categorical data) given the criteria (text). This should have been a simple classification problem. There are 7 labels, hence my network has 7 output neurons with sigmoid activation functions.

I encoded training data using the following simple format, in a txt file, using text descriptors ('criteria') and categorical label variables ('label'):


Here’s a peak at one entry from data file:

Headache location: Bilateral (intracranial). Facial pain: Nil. Pain quality: Pulsating. Thunderclap onset: Nil. Pain duration: 11. Pain episodes per month: 26. Chronic pain: No. Remission between episodes: Yes. Remission duration: 25. Pain intensity: Moderate (4-7). Aggravating/triggering factors: Innocuous facial stimuli, Bathing and/or showering, Chocolate, Exertion, Cold stimulus, Emotion, Valsalva maneuvers. Relieving factors: Nil. Headaches worse in the mornings and/or night: Nil. Associated symptoms: Nausea and/or vomiting. Reversible symptoms: Nil. Examination findings: Nil. Aura present: Yes. Reversible aura: Motor, Sensory, Brainstem, Visual. Duration of auras: 47. Aura in relation to headache: Aura proceeds headache. History of CNS disorders: Multiple Sclerosis, Angle-closure glaucoma. Past history: Nil. Temporal association: No. Disease worsening headache: Nil. Improved cause: Nil. Pain ipsilateral: Nil. Medication overuse: Nil. Establish drug overuse: Nil. Investigations: Nil.|Migraine with aura

Here’s a snippet of the code from the training algorithm:

dataset = pd.read_csv('Data/ICHD3_Database.txt', names=['criteria', 'label'], sep='|')
features = dataset['criteria'].values 
labels = dataset['label'].values

def BOW_Model(features):
    features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.33, random_state=42) 
    vectorizer = CountVectorizer() 
    features_train = vectorizer.fit_transform(features_train) 
    features_test = vectorizer.transform(features_test) 
    return features_train, features_test, labels_train, labels_test

def word_embeddings(features):
    maxlen = 200
    features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.33, random_state=42) 
    tokenizer = Tokenizer(num_words=5000)
    features_train = pad_sequences(tokenizer.texts_to_sequences(features_train), padding='post', maxlen=maxlen)
    features_test = pad_sequences(tokenizer.texts_to_sequences(features_test), padding='post', maxlen=maxlen) 
    vocab_size = len(tokenizer.word_index) + 1  # Adding 1 because of reserved 0 index
    labels_train = pad_sequences(tokenizer.texts_to_sequences(labels_train), padding='post', maxlen=maxlen)
    labels_test = pad_sequences(tokenizer.texts_to_sequences(labels_test), padding='post', maxlen=maxlen)
    vocab_size += len(tokenizer.word_index) + 1  # Adding 1 because of reserved 0 index
    return features_train, features_test, labels_train, labels_test, vocab_size, maxlen

features_train, features_test, labels_train, labels_test, vocab_size, maxlen = word_embeddings(features) # Pre-process text using word embeddings

def design_model(features, hidden_layers=2, number_neurons=128):
    model = Sequential(name = "My_Sequential_Model") 
    model.add(layers.Embedding(input_dim=vocab_size, output_dim=50, input_length=maxlen)) 
    model.add(layers.Conv1D(128, 5, activation='relu'))
    for i in range(hidden_layers): 
        model.add(Dense(number_neurons, activation='relu')) 
    model.add(Dense(7, activation='sigmoid')) 
    opt = Adam(learning_rate=0.01) 
    model.compile(loss='binary_crossentropy', metrics=['mae'], optimizer=opt)
    return model

model = design_model(features_train, hidden_layers=2, number_neurons=30) 
history =, labels_train, epochs=10, batch_size=16, verbose=0, validation_split=0.33, callbacks=[EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=20)]) 

But when I run the model, I get the following error:

Traceback (most recent call last):
  File "c:\Users\user\Desktop\Deep Learning\", line 112, in <module>
    history =, labels_train, epochs=10, batch_size=16, verbose=0, validation_split=0.33, callbacks=[EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=20)]) # 18. Fit model using optimized epochs & batch size. When the training performance reaches the plateau or starts degrading, the learning stops.
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\utils\", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "C:\Users\user\AppData\Local\Temp\", line 15, in tf__train_function
    retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
ValueError: in user code:

    File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\engine\", line 1401, in train_function  *
        return step_function(self, iterator)
    File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\engine\", line 1384, in step_function  **
        outputs =, args=(data,))
    File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\engine\", line 1373, in run_step  **
        outputs = model.train_step(data)
    File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\engine\", line 1151, in train_step
        loss = self.compute_loss(x, y, y_pred, sample_weight)
    File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\engine\", line 1209, in compute_loss
        return self.compiled_loss(
    File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\engine\", line 277, in __call__
        loss_value = loss_obj(y_t, y_p, sample_weight=sw)
    File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\", line 143, in __call__
        losses = call_fn(y_true, y_pred)
    File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\", line 270, in call  **
        return ag_fn(y_true, y_pred, **self._fn_kwargs)
    File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\", line 2532, in binary_crossentropy
        backend.binary_crossentropy(y_true, y_pred, from_logits=from_logits),
    File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\", line 5822, in binary_crossentropy
        return tf.nn.sigmoid_cross_entropy_with_logits(

    ValueError: `logits` and `labels` must have the same shape, received ((None, 1) vs (None, 200)).

Where am I going wrong?

Print and check if the labels that are going in are actually of the desired size of length 7. You can have an issue there.
Always use the ‘softmax’ activation in the last layer: it was build for cross-entropy and it’ll give you much more reliable results than using sigmoid, and is much more efficient computationally.

Hi @The_Machine_Preacher ,

welcome to the forum :tada:.

Please keep an eye on the selected loss. Since BinaryCrossentropy is just for (0 or 1) classification.
Can you try tf.keras.losses.CategoricalCrossentropy(from_logits=False) or from_logits=True

Please feel free to share the shape of your labels (dificult to read from code) …

Looking forward,

The error you’re encountering (ValueError: logits and labels must have the same shape, received ((None, 1) vs (None, 200)) ) suggests a mismatch between the shape of the predictions your model is generating and the shape of your target labels

Given that you’re working on a classification problem with 7 labels, there are a few potential issues to address:

  1. Label Encoding: For a multi-class classification problem with 7 categories, you should ensure your labels are one-hot encoded, resulting in a label array shape of [num_samples, 7], where num_samples is the number of examples in your dataset. It seems there might be a confusion in how you’re handling labels, especially since you’re using padding on them, which is unusual for categorical labels.
  2. Final Layer Activation: For multi-class classification, it’s standard to use a softmax activation function in the final layer, not sigmoid. Softmax will ensure that the output probabilities sum up to 1, making it suitable for multi-class classification. Change the activation function of your final Dense layer to softmax:

pythonCopy code

model.add(Dense(7, activation='softmax'))
  1. Loss Function: When using softmax activation in the final layer for multi-class classification, the appropriate loss function is categorical_crossentropy, not binary_crossentropy. Update the loss function in your model compilation step:

pythonCopy code

model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer=opt)
  1. Review Data Preprocessing: Ensure that your label preprocessing correctly one-hot encodes the labels into a 2D array of shape [num_samples, 7]. The use of pad_sequences on labels is peculiar and might not be appropriate unless you’re dealing with a sequence prediction problem, which doesn’t seem to be the case here.
  2. Model Output: Ensure the model’s output layer has the correct number of units (7 for your case) and matches the shape of your one-hot encoded labels. The error message indicates a mismatch, possibly due to incorrect handling of label preprocessing.

Correct these aspects, and your model should be able to train without encountering the shape mismatch error. Here’s a revised snippet for your label encoding and model compilation:

pythonCopy code

from tensorflow.keras.utils import to_categorical

# Assuming 'labels' is an array of integer class labels
labels = to_categorical(labels, num_classes=7)

# Update your model compilation
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

Ensure your labels are correctly one-hot encoded and your model’s final layer and loss function are appropriately set up for a multi-class classification task.