Dataset input vs numpy array to model.fit gives different val loss

Im trying to train a neural network and my valuation loss is different based on the type of input. If I input to the models arrays it comparable to the total loss of the model but if I input datasets its really small.

Both models are the same with the same loss function.

When I input datasets

    train_dset = tf.keras.preprocessing.image_dataset_from_directory(directory="./Data/09_TrainingSet_VAE1",
                                                                     labels=None,
                                                                     label_mode=None,
                                                                     image_size=(138, 138),
                                                                     color_mode="grayscale",
                                                                     batch_size=None,
                                                                     shuffle=True)

    val_dset = tf.data.Dataset.from_tensor_slices(val_slices)
    train_dset = (train_dset.map(preprocess_dataset).batch(batch_sz).shuffle(1))


    val_dset = (val_dset.map(preprocess_dataset).batch(batch_sz).shuffle(1))

    # reset model weights before training
    VAE.set_weights(initial_weights)

    # fit model
    fit_results = VAE.fit(train_dset,
                          epochs=10,
                          validation_data=val_dset,
                          callbacks=[early_stopping_kfold, tensorboard_callback],
                          verbose=2
                          )

Gives output

Epoch 1/10
2022-08-01 12:35:02.073645: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8401
2022-08-01 12:35:02.789387: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-08-01 12:35:02.885751: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
596/596 - 7s - loss: 531.4684 - val_loss: 21.4242 - 7s/epoch - 12ms/step
Epoch 2/10
596/596 - 6s - loss: 251.6288 - val_loss: 5.6016 - 6s/epoch - 10ms/step
Epoch 3/10
596/596 - 4s - loss: 199.1088 - val_loss: 5.8034 - 4s/epoch - 7ms/step
Epoch 4/10
596/596 - 4s - loss: 163.6304 - val_loss: 2.5418 - 4s/epoch - 6ms/step
Epoch 5/10
596/596 - 5s - loss: 134.6796 - val_loss: 1.2967 - 5s/epoch - 8ms/step
Epoch 6/10
596/596 - 4s - loss: 118.3922 - val_loss: 0.7600 - 4s/epoch - 7ms/step

The model with array as input

    train_slices = preprocess_data(CropTumor, file_array[train_dataset])
    val_slices = preprocess_data(CropTumor, file_array[val_dataset])



    # reset model weights before training
    VAE.set_weights(initial_weights)

    # fit model
    fit_results = VAE.fit(train_slices,train_slices,
                          epochs=1000,
                          validation_data=(val_slices,val_slices),
                          callbacks=[early_stopping_kfold, tensorboard_callback],
                          batch_size=batch_sz,
                          verbose=2
                          )

Gives output

Epoch 1/1000
2022-08-01 12:37:41.620706: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8401
2022-08-01 12:37:42.338416: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-08-01 12:37:42.433343: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
623/623 - 7s - loss: 539.5004 - val_loss: 283.0893 - 7s/epoch - 12ms/step
Epoch 2/1000
623/623 - 4s - loss: 261.8237 - val_loss: 204.0882 - 4s/epoch - 6ms/step
Epoch 3/1000
623/623 - 4s - loss: 201.8370 - val_loss: 174.8553 - 4s/epoch - 6ms/step
Epoch 4/1000
623/623 - 4s - loss: 159.5551 - val_loss: 146.8141 - 4s/epoch - 6ms/step
Epoch 5/1000
623/623 - 4s - loss: 132.1346 - val_loss: 129.0631 - 4s/epoch - 6ms/step
Epoch 6/1000
623/623 - 4s - loss: 116.8696 - val_loss: 121.5118 - 4s/epoch - 6ms/step
Epoch 7/1000
623/623 - 4s - loss: 106.5005 - val_loss: 111.9231 - 4s/epoch - 7ms/step
Epoch 8/1000
623/623 - 4s - loss: 98.8890 - val_loss: 108.3091 - 4s/epoch - 6ms/step
Epoch 9/1000

The loss function which is the same

def loss_func(encoder_mu, encoder_log_variance):
    def vae_reconstruction_loss(y_true, y_predict):

        reconstruction_loss = tf.math.reduce_sum(tf.math.square(y_true-y_predict), axis=[1, 2, 3])
        return reconstruction_loss

    def vae_kl_loss(encoder_mu, encoder_log_variance):
        kl_loss = -0.5 * tf.math.reduce_sum(1.0 + encoder_log_variance - tf.math.square(encoder_mu) - tf.math.exp(encoder_log_variance),
                                  axis=1)
        return kl_loss


    def vae_loss(y_true, y_predict):
        reconstruction_loss = vae_reconstruction_loss(y_true, y_predict)
        kl_loss = vae_kl_loss(y_true, y_predict)
        loss = reconstruction_weight*reconstruction_loss + kl_weight*kl_loss
        return loss

    return vae_loss

model compilation

# Compile model
VAE.compile(optimizer=tfk.optimizers.Adam(learning_rate=learning_rate),
            loss=loss_func(encoder_mu_layer, encoder_log_variance_layer))

Solved it. It was a bug in my code after all.
preprocess_dataset function diveded by 255 and I used the same for both the training dataset which was coming from a directory (0-255 range) and the validation dataset which came from a loaded array already divide it by 255