Hi all,
When I train a model, wondering about what is effect of use_bias.
The model is
base_model = VGG19(input_shape=(config.img_size, config.img_size, 3),
include_top=False)gap = GlobalAveragePooling2D()(base_model.output) # class output dense_b1_1 = Dense(256, use_bias=False)(gap) relu_b1_2 = Activation(tf.nn.relu)(dense_b1_1) dense_b1_3 = Dense(256, use_bias=False)(relu_b1_2) relu_b1_4 = Activation(tf.nn.relu)(dense_b1_3) cls_output = Dense(len(config.class_dict), activation='softmax')(relu_b1_4) # regression output dense_b2_1 = Dense(128, use_bias=False)(gap) relu_b2_2 = Activation(tf.nn.relu)(dense_b2_1) dense_b2_3 = Dense(128, use_bias=False)(relu_b2_2) relu_b2_4 = Activation(tf.nn.relu)(dense_b2_3) reg_output = Dense(4, activation='sigmoid')(relu_b2_4) concat = Concatenate()([cls_output, reg_output]) model = Model(inputs=base_model.inputs, outputs=concat)
I used same network, the result of using use_bias=False
one is
Epoch 1/200
160/160 [==============================] - 69s 384ms/step - loss: 0.5087 - val_loss: 0.4395
Epoch 2/200
160/160 [==============================] - 62s 386ms/step - loss: 0.3718 - val_loss: 0.4238
Epoch 3/200
160/160 [==============================] - 59s 372ms/step - loss: 0.3471 - val_loss: 0.3383
Epoch 4/200
160/160 [==============================] - 60s 377ms/step - loss: 0.3177 - val_loss: 0.3598
Epoch 5/200
160/160 [==============================] - 61s 371ms/step - loss: 0.3153 - val_loss: 0.3069
Epoch 6/200
160/160 [==============================] - 59s 372ms/step - loss: 0.3128 - val_loss: 0.3124
Epoch 7/200
160/160 [==============================] - 59s 372ms/step - loss: 0.2946 - val_loss: 0.2869
Epoch 8/200
160/160 [==============================] - 60s 376ms/step - loss: 0.2702 - val_loss: 0.3102
Epoch 9/200
160/160 [==============================] - 62s 376ms/step - loss: 0.2888 - val_loss: 0.2878
Epoch 10/200
160/160 [==============================] - 60s 376ms/step - loss: 0.2674 - val_loss: 0.3123Epoch 00010: ReduceLROnPlateau reducing learning rate to 9.999999747378753e-11.
Epoch 11/200
160/160 [==============================] - 60s 377ms/step - loss: 0.3003 - val_loss: 0.3123
Epoch 12/200
160/160 [==============================] - 59s 370ms/step - loss: 0.2878 - val_loss: 0.3123
Restoring model weights from the end of the best epoch.
Epoch 00012: early stopping
but with use_bias=True
, the result is
160/160 [==============================] - 71s 396ms/step - loss: 1.1255 - val_loss: 1.1355
Epoch 2/200
160/160 [==============================] - 61s 380ms/step - loss: 1.1305 - val_loss: 1.1355
Epoch 3/200
160/160 [==============================] - 61s 383ms/step - loss: 1.1266 - val_loss: 1.1355
Epoch 4/200
160/160 [==============================] - 61s 383ms/step - loss: 1.1267 - val_loss: 1.1355Epoch 00004: ReduceLROnPlateau reducing learning rate to 9.999999747378753e-11.
Epoch 5/200
160/160 [==============================] - 62s 380ms/step - loss: 1.1328 - val_loss: 1.1355
Epoch 6/200
160/160 [==============================] - 61s 383ms/step - loss: 1.1141 - val_loss: 1.1355
Epoch 7/200
160/160 [==============================] - 61s 385ms/step - loss: 1.1289 - val_loss: 1.1355Epoch 00007: ReduceLROnPlateau reducing learning rate to 9.99999943962493e-16.
Epoch 8/200
160/160 [==============================] - 61s 379ms/step - loss: 1.1305 - val_loss: 1.1355
Restoring model weights from the end of the best epoch.
Epoch 00008: early stopping
Both of networks are built without batch normalization.
But why use_bias=False one has less loss than use_bias=True one?