Loss is not decreasing

loss is not decreasing, and stay about 10
training is based on VOC2021 images (originally 20 clasees and about 15000 images), i added there 1 new class with 40 new images.
i use:
ssd_inception_v2_coco model.
Python 3.6.13
tensorflow 1.15.5

I have to use tensorflow 1.15 in order to be able to use DirectML because i have AMD GPU

followed this tutorial:
https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/tensorflow-1.14/

@Manoson Welcome to Tensorflow Forum!

Here are potential causes and solutions for the stagnant loss in your object detection training scenario:

1. Insufficient Data for New Class:

  • 40 images might be inadequate for a new class, especially with a complex model like ssd_inception_v2_coco.
  • Solutions:
    • Gather more images for the new class (aim for a few hundred or more).
    • Utilize data augmentation techniques (random cropping, flipping, color jittering) to artificially increase diversity.

2. Imbalanced Dataset:

  • The original 20 classes have significantly more data than the new class, potentially biasing the model towards them.
  • Solutions:
    • Oversample the new class during training to give it more weight.
    • Use class weighting techniques to balance the importance of different classes during loss calculation.

3. Learning Rate Issues:

  • An inappropriate learning rate might hinder model convergence.
  • Solutions:
    • Experiment with different learning rates (e.g., smaller values like 1e-4 or 1e-5).
    • Implement learning rate scheduling to gradually decrease the learning rate during training.

4. Overfitting:

  • The model might be overfitting to the training data, preventing generalization to new examples.
  • Solutions:
    • Employ early stopping to halt training when validation loss starts increasing.
    • Use regularization techniques like L1/L2 weight decay or dropout to reduce overfitting.

5. Incorrect Configuration:

  • Double-check model configuration, data loading, and training setup for errors.
  • Solutions:
    • Verify label mapping, data preprocessing, and loss function configuration.
    • Ensure proper model architecture loading and training loop implementation.

Additional Troubleshooting Steps:

  • Visualize Training: Plot loss and accuracy curves for both training and validation sets to identify potential issues like overfitting or underfitting.
  • Monitor Gradients: Check for vanishing or exploding gradients, which can impede learning.
  • Experiment with Hyperparameters: Adjust batch size, optimizer, and other hyperparameters to find the optimal settings for your dataset.
  • Explore Transfer Learning: Consider using a pre-trained model on a larger dataset like COCO and fine-tuning it for your specific task. This can often lead to better performance with limited data.

Let us know if this helps!