In object detection, we usually use a bounding box to describe the spatial location of an object.
Below tutorial fine-tune a pre-trained RetinaNet model with ResNet-50 as backbone for object detection and also saves and exports the trained model which can be reused later.
You can follow the similar approach in the above tutorial for your use case.
Also, you can try any of the models in the official object detection section of Model Garden Library.
I dont know if you are up for a more extended discussion. I did this previously:
Load VGG16 without the output layers
Add a regression head
Train with 400 faces images using as label a vector with the 4 the coordinates
Predict on new face pictures and the test set
This approach seems to just not work with any image that is not from the dataset, and even within the dataset, it seems not to get the box small enough around the face.
The above posted link covers almost all the work around one should perform to make the model learn.
I wanted to add a few more tips which have greatly helped me in such a scenario.
Increase the sample size , more the data better is the learning . And try to generalize the samples so that it creates a generalizable model.
Try to augment and normalize the dataset to increase the number of samples. Data augmentation and normalization are two prevalent techniques used to improve generalizability.
Regularization and reducing the architecture complexity are two other methods commonly used to prevent overfitting.
Try Trainable to True , in my experience, I always got better performance (lower error in regression) when setting Trainable=True .
If none of the above tips works , we need to inspect the model architecture (which is fixed in our case).
Let us know if the above tips works for you.
I followed it but find it rather cumbersome in terms of notation to be honest, yet I will try it. At the moment I just tried YoloV4 from pytorch implementation, and seems rather easy to run and get results from it.