Multi-label Text Classifier and Model Evaluation Metrics

Hi, I am working on a project where I am trying to predict JEL codes from abstracts.

Recently, I’ve been adapting the code from this example provided by Sayak Paul and Soumik Rakshit, where they built a multi-label text classifier that predicts the subject areas of arXiv papers based on their abstract bodies.

However, as I’ve been tailoring their code to my needs, I’ve encountered an intriguing issue regarding the model’s performance metrics. From the training logs of the original model, and in my adaptation, the validation loss seems to increase with each epoch. Conventionally, this trend suggests overfitting. Yet, the model performs admirably on the test set, in both the original example and my project (+99% in my case).

I’ve attached screenshots of my model’s performance metrics for your reference. I am looking to understand this phenomenon. Is the increasing validation loss indeed indicative of overfitting, or am I potentially misinterpreting some aspect of the model’s performance?

I would greatly appreciate any insights or suggestions that the community can offer on this matter. Thank you for your time.


Hi @Alessio_Garau ,

As per my understanding from the post, to address the increasing validation loss, you can try the following:

  1. Reduce the complexity of the model: This can be done by reducing the number of layers or the number of neurons in each layer.
  2. Use regularization techniques: Regularization techniques help prevent overfitting by adding a penalty to the loss function.
  3. Increase the size of the training data: A larger training set can help the model learn the general patterns in the data rather than the specific details of the training data.
  4. Collect a more representative test set: The test set should be collected in the same way as the training data and should be large enough to be statistically significant.
  5. Check label distribution: A significant difference in label distribution between the training and validation sets can impact model performance.

If you try all of these things and you’re still seeing increasing validation loss, then it’s possible that the model is simply not able to learn. In this case, you may need to collect more data or use a different model architecture.

I hope this helps!