I’m still trying to understand why my FasterRCNN implementation is converging incorrectly relative to a nearly-equivalent PyTorch implementation and I’ve narrowed in on two possible culprits:
- A broken computational graph and the use of train_step twice in my implementation.
- My RoI pool implementation (I’ve never been able to find working sample code for this and have had to implement it myself).
I build two models, which share many layers (e.g., the VGG-16 input layers). One model produces a series of bounding boxes and scores, which are then fed in to the second model to produce final bounding boxes and scores. The second model also takes the output of the shared layers of the first. The topology looks kind of like this:
Shared Layers | +------------+--------------------+ | | rpn_model Layers | | | rpn_model outputs | | | | | +---------------> classifier_model Layers | classifier_model outputs
I train by invoking train_step on each model, which should update the shared layers twice:
rpn_predictions = rpn_model.predict_on_batch(x = image) rpn_losses = rpn_model.train_on_batch(x = image, y = y_true) ... code to generate boxes from rpn_predictions ... classifier_losses = classifier_model.train_on_batch(x = [ image, boxes ], y = [ y_true_classifier ])
Is this doing what I assume it is doing? It should be possible to backprop from the two different model outputs, updating all the layers. Am I missing some step or is Keras potentially doing something non-obvious here?