Strange behaviour of DeepLabV3 on my dataset

Hi there,
I just started getting into semantic segmentation with Tensorflow some months ago.
I trained some DeepLab V3 model for semantic image segmentation on a data set where I have labelled certain tree species against the bulk of other trees and the soil.
When I try to use the trained model to predict tree species, I get very different results:

The red areas are “all other tree species”, purple is the soil/background. All other colours display various species.

As you can see, it works pretty much ok for some test images. For others the predictions are totally useless and for some, it doesn’t predict anything at all.

Have you experienced or can you explain behaviour like this? Is there any further information missing in order to explain this?


Are these results from the training, test, or validation set?

Results are from the test set

Are the metrics on the training set good?

Well, far from perfect. It reached a mean intersection over union of about 0.42 for training and 0.38 for validation set…

So I think you could to reproduce the performance on a well known dataset with a domain similar to your one and then adapt It to your dataset and achieve good performance on the training set first.

Try to take a look at: