Feature extraction at specific height

I will be using the EfficeintNet-D7 model through TensorFlow to count the number of objects in an image taken from the sky. I want to extract objects that only appear at 3ft or higher in an image (the object matches the ground so the model keeps incorrectly selecting it). How would I write this into the model?

Do you mean something like this?

https://arxiv.org/abs/1802.10249

I do not think this is the route I am wanting to go since I will be using orthomosaics and just want to make sure that I am only counting objects at a specific height since they sometimes blend into the background.

Do you want to solve this with a single aerial image like in: