Weighted output losses equal to different optimisers for model substructures

Samuel_K · March 15, 2023, 1:17pm

This question is more about general understanding than about concrete code.
Lets say i have a model and the model outputs a vector. these vektor is a concatination of the results of different submodels in the model. if i now apply a custom weighting on the output loss. Lets say with:

    def mse(true, pred): #for rnn-models
        factor_array = np.array([[[1.0, 2,0, 20.0]]])
        return tf.math.reduce_mean(tf.square(true-pred) * factor_array, axis=-1)

Is the effect of this the same like using multiple optimisers for the substructures of the main model with different learning rates? does this also mean that for the weighting there are the same restrictions like on learning rate, so that you cant use too high weightings or the model would diverge?
like the maximum weighting would be: max_weight < 1/learning_rate

Thanks for thinking this though with me

Laxma_Reddy_Patlolla · March 23, 2023, 6:24pm

Hi @Samuel_K,

Really it’s important to experiment with different weightings and monitor the model’s performance to determine the best approach.

When you apply a custom weighting on the output loss, you are essentially giving more importance to certain parts of the output vector over others. This can help to optimize the model’s performance on specific tasks and when you use multiple optimizers with different learning rates for the substructures of the main model, you are allowing each substructure to update at a different pace. This can help to avoid getting stuck in local minima and improve the model’s generalization ability.

it is possible that a high weighting can lead to the model diverging, just like a high learning rate can lead to the model diverging.

The ideal weighting depends on various factors such as the complexity of the model, the size of the dataset, and the specific task etc…

Thanks.

Samuel_K · May 11, 2023, 1:26pm

Yes, these factors are all known to me. I was just wondering if i can forget the complicated task of implementing different optimizers for different model outputs, since weighting the different losses has basically the same effect.