Changing activation function during training

Sebastian_P · April 3, 2024, 12:48pm

Hi, I’m currently trying to figure out a way to change my activation function(s) during training.
For example consider a leaky relu max(x, x/a) with a = 1 at the start of the training and as the training progresses i want to increase ‘a’ slowly.

What I’ve tried so far (via a callback) (minimal example):

class change_act_func(ke.callbacks.Callback):

    def __init__(self, model):
        self.model = model

    def on_epoch_begin(self, epoch, logs):
        if epoch == 2:
            def my_l_relu(x):
                a = 0.5
                return tf.math.maximum(x, x/a)

            self.model.layers[1].activation = my_l_relu

This doesn’t seem to work though. Any hints/ideas?
Many thanks in advance : )

Ajay_Krishna · April 5, 2024, 4:56am

What do you mean by increasing a, do you want it more than 1???

Leaky ReLU with a = 1 becomes ReLU: When ‘a’ is set to 1, the equation max(x, x/a) becomes max(x, x) which is simply the ReLU function itself. Leaky ReLU’s benefit of allowing a small gradient for negative inputs disappears in this case.

Increasing a value is not preferred
Challenges in Implementation: While technically possible, gradually increasing ‘a’ during training would require modifying the activation function within the training loop. This can add complexity and might not be natively supported by all deep learning frameworks.

Here are some alternative approaches to achieve a similar effect as gradually increasing the slope in Leaky ReLU:

Fixed Leaky ReLU with a small slope: You can choose a small, fixed value for ‘a’ (e.g., 0.01) from the beginning. This provides a small gradient for negative inputs throughout training, addressing the dying ReLU problem to some extent. This is the most common approach for Leaky ReLU.
Parametric ReLU (PReLU): This activation function extends Leaky ReLU by making the slope a learnable parameter. The network itself learns the optimal slope for each neuron during training. This eliminates the need for manual selection or adjustment of ‘a’.
Swish Activation Function: This function is similar to ReLU but has a smoother curve near zero. It can address the dying ReLU problem to some extent while offering a more gradual transition than ReLU.

Sebastian_P · April 5, 2024, 3:43pm

PS.: i just saw that i made a small mistake in the code example. obviously a shouldn’t be 0.5 but rather something bigger than 1.

Sebastian_P · April 5, 2024, 3:46pm

@Ajay_Krishna is this an ai generated response?

Ajay_Krishna · April 5, 2024, 3:59pm

Yes, I used Chat GPT to explain what I want to convey better and I think it has done a good job in explaining the PReLU which is the best way to adjust the ‘a’ value automatically.

Sebastian_P · April 5, 2024, 5:39pm

Here is my problem with this answer:
Not only did it miss the point entirely but it is also wrong mathematically.

I’ll explain:

" Leaky ReLU with a = 1 becomes ReLU: When ‘a’ is set to 1, the equation max(x, x/a) becomes max(x, x) which is simply the ReLU function itself."
max(x, x) is the identity not a relu. A Relu would be max(x, 0). The answer is just false.

“Leaky ReLU’s benefit of allowing a small gradient for negative inputs disappears in this case.”
This is grammatically dubious at least but i get the point because i know what the benifits of a leaky relu are.

“Increasing a value is not preferred”
What?

“While technically possible, gradually increasing ‘a’ during training would require modifying the activation function within the training loop.”
Actually true but also pointless to mention

“This can add complexity and might not be natively supported by all deep learning frameworks.”
Obviously it adds complexity (duh). Also I’m actually absolutely certain that it IS supported by all frameworks because randomized leaky relus exist so this statement is also false or at least misleading.

“Here are some alternative approaches to achieve a similar effect as gradually increasing the slope in Leaky ReLU:”
First of all what i described was about decreasing the slope not increasing it so this shows a lack of understanding.
Second the suggested alternatives have a completely different behavior. fixed leaky relu and swish are exactly NOT what i was asking for and a prelu has a completely different behavior as well.

After all of that first let me say that I appreciate that you are trying to help me. I really do, thank you.
However these ai generated answers NEED to be thoroughly checked, as demonstrated above and i fear that not doing so is one of the biggest problems in the coming years.

Sebastian_P · April 12, 2024, 11:13am

I think i have a good enough solution. For anyone interested here is a minimal example (Tensorflow 2.15.0):


#################################

class My_LeakyReLU(Layer):

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        
    def build(self, input_shape):
       self.alpha = self.add_weight(initializer='ones', trainable=False, name='alpha')

    def call(self, inputs):
        return tf.math.maximum(inputs, tf.math.multiply(inputs, self.alpha))

    def get_config(self):
        config = {"alpha": float(self.alpha)}
        base_config = super().get_config()
        return dict(list(base_config.items()) + list(config.items()))

    def compute_output_shape(self, input_shape):
        return input_shape

#################################

class change_activation(ke.callbacks.Callback):

    def __init__(self, model):
        self.model = model

    def on_epoch_begin(self, epoch, logs):
        neg_slope_act = 1.0/epoch
        self.model.layers[42].alpha.assign(neg_slope_act)
        # where 42 is the index of the "My_LeakyReLU"-Layer

#################################