How we can compare loss and metrics evaluated on different sizes of training and validation sets?

Let N_train, N_val and N_test are the number of examples in training, validation, and test sets.

As I understood, in general these values are taken as N_train >> N_val ~= N_test.

As I understood, the loss and metrics are evaluated (in an average sense) on the whole training set. In this context, how can we compare the performance on sets of different sizes?

Why isn’t it like model performance is evaluated on a subset of the training set whose size is comparable to that of the validation or test set?

One can argue that it might increase the computational cost, but we can at least draw (randomly) from the losses evaluated during the training step.

Please let me know if there are any reasons behind this approach!