I would like to fine-tune the pre-trained VGG-Face network as described below:

min_{W,θ} ∑ {i=1 to N} L(sigmoid(Wη(a_i; θ)), yi)

where where η(a_i; θ) represents the output of the last full connected layer in the VGG-Face network.

θ and W separately denotes the network parameters of the VGG-Face network and the weights of the sigmoid layer.

L is the cross entropy.

Please some one help me.

Thank you

The above equation is a fine-tuning objective for the VGG-Face network. In this equation, W and θ represent the weights of the sigmoid layer and the parameters of the VGG-Face network, respectively. N is the number of training samples, a_i is the output of the last fully-connected layer in the VGG-Face network, and yi is the corresponding label of the training sample. L is the cross entropy loss, which is used to measure the discrepancy between the predicted label and the true label of a training sample.

The goal of fine-tuning the VGG-Face network is to find the optimal values of W and θ that minimize the loss function. This can be done by using a gradient-based optimization technique, such as stochastic gradient descent (SGD). In each iteration of SGD, the gradient of the objective function is calculated and used to update the weights and parameters of the network. The process is repeated until the loss function is minimized.