What is effect of use_bias

human · July 30, 2021, 4:52am

Hi all,
When I train a model, wondering about what is effect of use_bias.

The model is

base_model = VGG19(input_shape=(config.img_size, config.img_size, 3),
include_top=False)

gap = GlobalAveragePooling2D()(base_model.output)

# class output

dense_b1_1 = Dense(256, use_bias=False)(gap)

relu_b1_2 = Activation(tf.nn.relu)(dense_b1_1)

dense_b1_3 = Dense(256, use_bias=False)(relu_b1_2)

relu_b1_4 = Activation(tf.nn.relu)(dense_b1_3)

cls_output = Dense(len(config.class_dict), activation='softmax')(relu_b1_4)

# regression output

dense_b2_1 = Dense(128, use_bias=False)(gap)

relu_b2_2 = Activation(tf.nn.relu)(dense_b2_1)

dense_b2_3 = Dense(128, use_bias=False)(relu_b2_2)

relu_b2_4 = Activation(tf.nn.relu)(dense_b2_3)

reg_output = Dense(4, activation='sigmoid')(relu_b2_4)

concat = Concatenate()([cls_output, reg_output])

model = Model(inputs=base_model.inputs, outputs=concat)

I used same network, the result of using use_bias=False one is

Epoch 1/200
160/160 [==============================] - 69s 384ms/step - loss: 0.5087 - val_loss: 0.4395
Epoch 2/200
160/160 [==============================] - 62s 386ms/step - loss: 0.3718 - val_loss: 0.4238
Epoch 3/200
160/160 [==============================] - 59s 372ms/step - loss: 0.3471 - val_loss: 0.3383
Epoch 4/200
160/160 [==============================] - 60s 377ms/step - loss: 0.3177 - val_loss: 0.3598
Epoch 5/200
160/160 [==============================] - 61s 371ms/step - loss: 0.3153 - val_loss: 0.3069
Epoch 6/200
160/160 [==============================] - 59s 372ms/step - loss: 0.3128 - val_loss: 0.3124
Epoch 7/200
160/160 [==============================] - 59s 372ms/step - loss: 0.2946 - val_loss: 0.2869
Epoch 8/200
160/160 [==============================] - 60s 376ms/step - loss: 0.2702 - val_loss: 0.3102
Epoch 9/200
160/160 [==============================] - 62s 376ms/step - loss: 0.2888 - val_loss: 0.2878
Epoch 10/200
160/160 [==============================] - 60s 376ms/step - loss: 0.2674 - val_loss: 0.3123

Epoch 00010: ReduceLROnPlateau reducing learning rate to 9.999999747378753e-11.
Epoch 11/200
160/160 [==============================] - 60s 377ms/step - loss: 0.3003 - val_loss: 0.3123
Epoch 12/200
160/160 [==============================] - 59s 370ms/step - loss: 0.2878 - val_loss: 0.3123
Restoring model weights from the end of the best epoch.
Epoch 00012: early stopping

but with use_bias=True, the result is

160/160 [==============================] - 71s 396ms/step - loss: 1.1255 - val_loss: 1.1355
Epoch 2/200
160/160 [==============================] - 61s 380ms/step - loss: 1.1305 - val_loss: 1.1355
Epoch 3/200
160/160 [==============================] - 61s 383ms/step - loss: 1.1266 - val_loss: 1.1355
Epoch 4/200
160/160 [==============================] - 61s 383ms/step - loss: 1.1267 - val_loss: 1.1355

Epoch 00004: ReduceLROnPlateau reducing learning rate to 9.999999747378753e-11.
Epoch 5/200
160/160 [==============================] - 62s 380ms/step - loss: 1.1328 - val_loss: 1.1355
Epoch 6/200
160/160 [==============================] - 61s 383ms/step - loss: 1.1141 - val_loss: 1.1355
Epoch 7/200
160/160 [==============================] - 61s 385ms/step - loss: 1.1289 - val_loss: 1.1355

Epoch 00007: ReduceLROnPlateau reducing learning rate to 9.99999943962493e-16.
Epoch 8/200
160/160 [==============================] - 61s 379ms/step - loss: 1.1305 - val_loss: 1.1355
Restoring model weights from the end of the best epoch.
Epoch 00008: early stopping

Both of networks are built without batch normalization.
But why use_bias=False one has less loss than use_bias=True one?

lgusm · July 30, 2021, 11:42am

A Dense Layer will try to find the weights such as

y= Wx + b

W are the weights and b is the bias. When x is a matrix of values, W is also a matrix and b is a vector.
The use_bias parameter tells the layer if you want to add (and calculate) this vector on your results.

My understanding of the bias is that enables the model to find weights to a function closer to the ground truth.

How it affects your results depends on the data complexity.

To have an even better understanding, I’d suggest you take a look on this colab: Google Colab