What is the problem in my model please?

i’m doing image captioning using this model

def define_model(vocab_size, max_length):

    inputs1 = Input(shape=1120)
    fe1 = Dropout(0.3)(inputs1)
    fe2 = Dense(512, kernel_regularizer=regularizers.l2(1e-4), activation='relu')(fe1)
    inputs2 = Input(shape=(max_length,))
    se1 = Embedding(vocab_size, 256, mask_zero=True)(inputs2)
    se2 = Dropout(0.3)(se1)
    se2=BatchNormalization()(se2)
    se3 = LSTM(512)(se2)
    decoder1 = Concatenate()([fe2, se3])
    decoder2 = Dense(512, activation='relu')(decoder1)
    outputs = Dense(vocab_size, activation='softmax')(decoder2)
    model = Model(inputs=[inputs1, inputs2], outputs=outputs)
    opt =tf.keras.optimizers.Adam(learning_rate=1e-4)
    model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
    model.summary()
    return model

i got results

epoch 1     loss: 5.3386 - accuracy: 0.1759 - val_loss: 4.9454 - val_accuracy: 0.2214
epoch 2     loss: 4.2776 - accuracy: 0.2832 - val_loss: 3.9343 - val_accuracy: 0.3105
epoch 3     loss: 3.5354 - accuracy: 0.3599 - val_loss: 3.8278 - val_accuracy: 0.3210
epoch 4     loss: 3.2257 - accuracy: 0.4039 - val_loss: 3.9480 - val_accuracy: 0.3101
epoch 5     loss: 3.0297 - accuracy: 0.4326 - val_loss: 4.1156 - val_accuracy: 0.3072
epoch 6     loss: 2.9005 - accuracy: 0.4505 - val_loss: 4.2219 - val_accuracy: 0.3053
epoch 7     loss: 2.8103 - accuracy: 0.4622 - val_loss: 4.2751 - val_accuracy: 0.3020
epoch 8     loss: nan - accuracy: 0.1060 - val_loss: nan - val_accuracy: 0.0000e+00
callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='min')
history = model.fit(train_generator, epochs=50, steps_per_epoch=train_steps, verbose=1, callbacks=checkpoint, validation_data=val_generator, validation_steps=val_steps,batch_size=64) 

Sorry, I don’t have an answer for that, but…
It looks like a exploding gradients problem

I have two suggestions for you:

hope it helps

thanks a lot for replying but how can i check if the input of the network has non values ? can you help me in that please ?

Sorry for the vague answer but I’d start with looking that all images have a size bigger than zero. Same for the text associated to them. This is just a sanity check, not modeling related.

I have another question please if I did zero paddings as I got a problem in matrix size … can be it a problem for nan values ? as I got I can handle this problem by adding a clipvalue parameter in the optimizer does that right ?

If you mean matrix size regarding the input images, I’d first rescale all of them to a the expected size (as it’s done in the sample) it will make your life easier