Why does my validation loss increase, but validation accuracy perfectly matches training accuracy?

I am building a simple 1D convolutional neural network in Keras. Here is the model:

def build_model():

    model = models.Sequential()
    model.add(layers.SeparableConv1D(64, kernel_size=2, activation="relu", input_shape=(64,20)))
    model.add(layers.SeparableConv1D(64, kernel_size=2, activation="relu"))
    model.add(layers.MaxPooling1D(4))
    model.add(layers.Flatten())
    model.add(layers.Dense(128, activation="relu"))
    model.add(layers.Dense(128, activation="relu"))
    model.add(layers.Dropout(0.1))
    model.add(layers.Dense(1, activation="sigmoid"))

    model.compile(
        optimizer='rmsprop',
        loss='binary_crossentropy',
        metrics=[
            keras.metrics.BinaryAccuracy(),
        ],
    )
    
    #model.summary()
    
    return model

When I train my model on roughly 1500 samples, I always get my training and validation accuracy completely overlapping and virtually equal, reflected in the graph below. This is making me think there is something fishy going on with my code or in Keras/Tensorflow since the loss is increasing dramatically and you would expect the accuracy to be affected at least somewhat by this. It looks like it is massively overfitting and yet only reporting the accuracy values for the training set or something along those lines. When I then test on a test set, the accuracy is nowhere near the 85 to 90 percent reported on the graph, but rather ~70%.

Any help is greatly appreciated, I have been stuck on this for the longest time. Below is the training code.

#Define the number of folds... this will give us an 80/20 split
k = 5
epochs = 100
num_val_samples = len(x_train) // k
scores_binacc = []
scores_precision = []
scores_recall = []
histories = []

#Train the dense model in k iterations
for i in range(k):
    print('Processing fold #', i)
    val_data = x_train[i * num_val_samples : (i + 1) * num_val_samples]
    val_targets = y_train[i * num_val_samples : (i + 1) * num_val_samples]
    
    print('Validation partition =  ', i * num_val_samples, (i + 1) * num_val_samples)
    print('Training partition 1 = ', 0, i * num_val_samples)
    print('Training partition 2 = ', (i+1) * num_val_samples, len(x_train))
    
    partial_train_data = np.concatenate(
        [
            x_train[:i * num_val_samples],
            x_train[(i+1) * num_val_samples:]
        ], 
        axis=0
    )
    
    partial_train_targets = np.concatenate(
        [
            y_train[:i * num_val_samples],
            y_train[(i+1) * num_val_samples:]
        ],
        axis=0
    )
    
    model = build_model()
    h = model.fit(
        partial_train_data, 
        partial_train_targets, 
        validation_data=(val_data, val_targets),
        epochs=epochs, 
        verbose=1
    )
    
    val_loss, val_binacc = model.evaluate(val_data, val_targets, verbose=0)
    scores_binacc.append(val_binacc)
    #scores_precision.append(val_precision)
    #scores_recall.append(val_recall)
    histories.append(h)

Maybe you’re overfitting but the underlying relationships are simple so your validation set still has decent accuracy but higher loss.

I feel like the change in accuracy could be caused by shuffling. Are you shuffling your data during training but not on test data? Does order matter for your problem?