TF2/keras: Custom pooling layer does not evaluate correctly during training


I’m trying to implement an RoI pooling layer in Keras. I have a reference implementation based on multiple nested tf.map_fn calls that works correctly but is incredibly slow (I can’t stop TF from re-tracing each time, apparently).

I tried a more naive and simpler approach using a for-loop directly. But I don’t understand how Keras executes this. When I build a simple single-layer test model to unit test my layer, it works just fine:

input_map = Input(shape = (9,8,num_channels))               # input map size
input_rois = Input(shape = (num_rois,4), dtype = tf.int32)  # N RoIs, each of length 4 (y,x,h,w)
output_roi_pool = RoIPoolingLayer(pool_size = pool_size)([input_map, input_rois])
roi_model = Model([input_map, input_rois], output_roi_pool)

x = [ x_maps, x_rois ]
y = roi_model.predict(x = x)

During training, where I have a more complex model (FasterRCNN) and I have to call train_on_batch(), something is not right. I’ve tried a few different ways to print out the outputs of the layer but it’s not clear what is happening. Sometimes I get values, sometimes I get 0’s.

Here is my code:

class RoIPoolingLayer(Layer):
  Input shape:
    Two tensors [x_maps, x_rois] each with shape:
      x_maps: (samples, height, width, channels), representing the feature maps for this batch, of type tf.float32
      x_rois: (samples, num_rois, 4), where RoIs have the ordering (y, x, height, width), all tf.int32
  Output shape:
    (samples, num_rois, pool_size, pool_size, channels)
  def __init__(self, pool_size, **kwargs):
    self.pool_size = pool_size
  def get_config(self):
    config = {
      "pool_size": self.pool_size,
    base_config = super(RoIPoolingLayer, self).get_config()
    return dict(list(base_config.items()) + list(config.items()))

  def compute_output_shape(self, input_shape):
    map_shape, rois_shape = input_shape
    assert len(map_shape) == 4 and len(rois_shape) == 3 and rois_shape[2] == 4
    assert map_shape[0] == rois_shape[0]  # same number of samples
    num_samples = map_shape[0]
    num_channels = map_shape[3]
    num_rois = rois_shape[1]
    return (num_samples, num_rois, self.pool_size, self.pool_size, num_channels)
  def call(self, inputs):
    return tf.map_fn(
      fn = lambda input_pair:
        _RoIPoolingLayer._compute_pooled_rois(feature_map = input_pair[0], rois = input_pair[1], pool_size = self.pool_size),
      elems = inputs,
      fn_output_signature = tf.float32  # this is absolutely required else the fn type inference seems to fail spectacularly
  def _compute_pooled_rois(feature_map, rois, pool_size):
    num_channels = feature_map.shape[2]
    num_rois = rois.shape[0]
    if num_rois == None:
      return tf.zeros(shape=(tf.shape(rois)[0],pool_size, pool_size, num_channels))
    pools = []
    for roi_idx in range(num_rois):
      region_y = rois[roi_idx, 0]
      region_x = rois[roi_idx, 1]
      region_height = rois[roi_idx, 2]
      region_width = rois[roi_idx, 3]
      region_of_interest = tf.slice(feature_map, [region_y, region_x, 0], [region_height, region_width, num_channels])
      x_step = tf.cast(region_width, dtype = tf.float32) / tf.cast(pool_size, dtype = tf.float32)
      y_step = tf.cast(region_height, dtype = tf.float32) / tf.cast(pool_size, dtype = tf.float32)
      for y in range(pool_size):
        for x in range(pool_size):
          pool_y_start = y
          pool_x_start = x
          pool_y_start_int = tf.cast(pool_y_start, dtype = tf.int32)
          pool_x_start_int = tf.cast(pool_x_start, dtype = tf.int32)
          y_start = tf.cast(pool_y_start * y_step, dtype = tf.int32)
          x_start = tf.cast(pool_x_start * x_step, dtype = tf.int32)
          y_end = tf.cond((pool_y_start_int + 1) < pool_size,
            lambda: tf.cast((pool_y_start + 1) * y_step, dtype = tf.int32),
            lambda: region_height
          x_end = tf.cond((pool_x_start_int + 1) < pool_size,
            lambda: tf.cast((pool_x_start + 1) * x_step, dtype = tf.int32),
            lambda: region_width
          x_size = tf.math.maximum(x_end - x_start, 1)
          pool_cell = tf.slice(region_of_interest, [y_start, x_start, 0], [y_size, x_size, num_channels])
          pooled = tf.math.reduce_max(pool_cell, axis=(1,0))  # keep channels independent
    return tf.reshape(tf.stack(pools, axis = 0), shape = (num_rois, pool_size, pool_size, num_channels)) 

Note the print statement.

During execution, it sometimes returns a numeric tensor but most of the time returns unevaluated tensors:

Tensor("model_3/roi_pool/map/while/Max:0", shape=(512,), dtype=float32)
Tensor("model_3/roi_pool/map/while/Max_1:0", shape=(512,), dtype=float32)
Tensor("model_3/roi_pool/map/while/Max_2:0", shape=(512,), dtype=float32)
Tensor("model_3/roi_pool/map/while/Max_3:0", shape=(512,), dtype=float32)
Tensor("model_3/roi_pool/map/while/Max_4:0", shape=(512,), dtype=float32)
Tensor("model_3/roi_pool/map/while/Max_5:0", shape=(512,), dtype=float32)

No idea what is going on. I thought TF2 would use eager execution for this sort of code but I guess not. It appears to be building a graph sometimes (but not on every step) but then the graph does not seem to be executed correctly.

I’m happy to share more code. I’m at a complete loss as to how to get this to work. I would expect that if no error is thrown, this should execute correctly, but instead it is doing something difficult for me to comprehend.

Inspecting the layer at run-time seems to disturb the result. As far as I know, it requires a function evaluation, which I believe forces execution of the graph (?) and gives me what appears to be the right result. However, that is clearly not what is actually happening during training.

Any ideas how I can make this work? All the examples of RoI pooling in Keras online are either painfully slow (multiple layers of tf.map_fn) or don’t work (there are other old for-loop-based examples similar to this one).