Gradient normal clipping having an unusually strong effect despite no clear evidence of exploding gradients

Bart · September 16, 2021, 8:19am

Hi,

I’m trying to implement FasterRCNN in Keras and having enormous difficulty doing so. I believe I am encountering some sort of numerical stability issue. When I try to train only the region proposal network (VGG16 + a couple convolution layers tacked on producing two output maps) on a small subset of images, training progress as measured by recall % is very slow and oscillates quite a bit.

As I enable clipnorm, with values like 8, 4, and 1, behavior improves. I don’t see losses exploding or turning into NaNs in any case and am at a loss as to how to debug the issue.

I suspect the issue must involve my loss functions but they are relatively simple (apart from a lot of reshaping of tensors). What would be a good way to debug this?

I’ve thought of printing out the gradient norm per training sample but am not sure how to obtain the gradient in the first place. Most of the examples I’ve seen online no longer work with TF 2.0.

Thank you,

Bart

Bhack · September 16, 2021, 11:45am

Have you already tried to finetune our reference implementation on your data?

Bart · September 16, 2021, 2:48pm

Is that based on FasterRCNN? I’m trying to replicate on Pascal VOC2007. I did test against a simple Pytorch version which converges very fast as expected. I don’t see a major difference in my code except that my ground truth values are stored in a large map and so my loss functions have to perform a bit of tensor slicing and reshaping to get at the y_true values.

Bhack · September 16, 2021, 2:58pm

Yes we have many faster-rcnn:

Bart · September 16, 2021, 4:53pm

Thanks, I will try to dig in and see if I can run the RPN portion but my objective is really to replicate this on my own. I have an implementation that does learn but the mAP is low and convergence takes much longer than it should. I suspect numerical instability.

The chief differences in my implementation are:

VGG16 backbone (none of the TF ones seem to use this).
Single class output in the RPN per anchor (objectness score) with sigmoid rather than two (object score, background score) with soft max. Background score is unused by the model anyway.
My truth values come in a more complex form because I pass a single large map stuffed with class and regression ground truth data.

I think it would be helpful to be able to print out the norm of the gradient but is there an example of how to do this with the current Keras API?

Bhack · September 16, 2021, 5:32pm

You can get the gradients with something like:

Bart · September 16, 2021, 8:16pm

Thank you. I got it working. Hopefully monitoring how this evolves will help me deduce the problem.
I looked at the FasterRCNN meta-architecture definition in the TF code base but it looks quite complex to use in practice. I also notice the use of batch normalization in the RPN, which is not part of the original FasterRCNN model and is not needed in PyTorch implementations I’ve studied: models/faster_rcnn_meta_arch.py at bea8998b1974015a01e5aa0e2d80a1c6623798a7 · tensorflow/models · GitHub

If I can’t resolve this in the next day or two, I will try to write a minimalistic RPN implementation in both Keras and PyTorch to confirm whether the problem is Keras.

Bhack · September 16, 2021, 8:28pm

Ok we are refactoring the new Vision models to have a more linear approach at:

https://github.com/tensorflow/models/blob/master/official/vision/beta/MODEL_GARDEN.md

There isn’t Faster-rcnn but you can still find something useful for the object detection and segmentation tasks