Loss is not decreasing

Manoson · November 24, 2021, 10:46am

loss is not decreasing, and stay about 10
training is based on VOC2021 images (originally 20 clasees and about 15000 images), i added there 1 new class with 40 new images.
i use:
ssd_inception_v2_coco model.
Python 3.6.13
tensorflow 1.15.5

I have to use tensorflow 1.15 in order to be able to use DirectML because i have AMD GPU

followed this tutorial:
https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/tensorflow-1.14/

Tanya · January 2, 2024, 7:30pm

@Manoson Welcome to Tensorflow Forum!

Here are potential causes and solutions for the stagnant loss in your object detection training scenario:

1. Insufficient Data for New Class:

40 images might be inadequate for a new class, especially with a complex model like ssd_inception_v2_coco.
Solutions:
- Gather more images for the new class (aim for a few hundred or more).
- Utilize data augmentation techniques (random cropping, flipping, color jittering) to artificially increase diversity.

2. Imbalanced Dataset:

The original 20 classes have significantly more data than the new class, potentially biasing the model towards them.
Solutions:
- Oversample the new class during training to give it more weight.
- Use class weighting techniques to balance the importance of different classes during loss calculation.

3. Learning Rate Issues:

An inappropriate learning rate might hinder model convergence.
Solutions:
- Experiment with different learning rates (e.g., smaller values like 1e-4 or 1e-5).
- Implement learning rate scheduling to gradually decrease the learning rate during training.

4. Overfitting:

The model might be overfitting to the training data, preventing generalization to new examples.
Solutions:
- Employ early stopping to halt training when validation loss starts increasing.
- Use regularization techniques like L1/L2 weight decay or dropout to reduce overfitting.

5. Incorrect Configuration:

Double-check model configuration, data loading, and training setup for errors.
Solutions:
- Verify label mapping, data preprocessing, and loss function configuration.
- Ensure proper model architecture loading and training loop implementation.

Additional Troubleshooting Steps:

Visualize Training: Plot loss and accuracy curves for both training and validation sets to identify potential issues like overfitting or underfitting.
Monitor Gradients: Check for vanishing or exploding gradients, which can impede learning.
Experiment with Hyperparameters: Adjust batch size, optimizer, and other hyperparameters to find the optimal settings for your dataset.
Explore Transfer Learning: Consider using a pre-trained model on a larger dataset like COCO and fine-tuning it for your specific task. This can often lead to better performance with limited data.

Let us know if this helps!