Is it correct to train and validate the model on F1-score metrics?

I am trying to do experiments on multiple data sets. Some are more imbalanced than others. Now, in order to assure fair reporting, we compute F1-Score on test data. In most machine learning models, we train and validate the model via accuracy measure metric. However, this time, I decided to train and validate the model on an F1-score metric measure. Technically, there should be no problems, in my opinion. However, I am wondering if this is the correct approach to go.

Second, when I use this method (training, validation on F1-score), I receive a higher loss error and a lower F1-score on training data than on validation data. I’m not sure why.

What kind of loss are you using?

@Bhack Cross Entropy

What do you mean with this?

When you train, the metric is generated on the weights for (epoch - 1), then the weight update is applied, then the metric is generated from the updated weights. So, the validation metric will always be better than the metrics calculated during training.