MobileNetV3: small change in architecture between TF2.5 and 2.6

For tf.keras.applications.MobileNetV3 (large or small), there’s been a slight change to the architecture from TF <=2.5 to TF 2.6. Specifically the GlobalAveragePooling2D layer happens before “Conv_2” in TF2.6, but after “Conv_2” (and it’s non-linear activation) in TF2.5.

These operations don’t commute, so the architectures are slightly different. Both versions point to the same pre-trained weights, so their architectures ought to be the same.

I haven’t checked if this degrades the performance of the pretrained models.

My interest in this is mostly that it’s a breaking change to the API: MobileNetV3Large(include_top=False) will output a tensor of shape [?, 1, 1, 1280] starting with TF2.6 compared to a tensor of shape [?, 7, 7, 1280] with TF <=2.5 (assuming an input of shape [?, 224, 224, 3]).

1 Like

This is a bug fix.
The two ops don’t quite commute but they commute well enough that both versions of the model do well with the weights.
You are correct about the change in the feature vector shape. The new version is the one that is “correct”.

4 Likes

For context, here’s the original change from GitHub: Fixed MobilenetV3 from keras/application by l-bat · Pull Request #48542 · tensorflow/tensorflow · GitHub

MobileNetV3Large(include_top=False) will output a tensor of shape [?, 1, 1, 1280] starting with TF2.6 compared to a tensor of shape [?, 7, 7, 1280] with TF <=2.5 (assuming an input of shape [?, 224, 224, 3]).

This is indeed a problem. I think we should change the location of if include_top: to return the feature map before pooling.

4 Likes

For my use-case, I would like to preserve spatial information when setting include_top=False, but it’s also really not that big a deal: I can grab the layer immediately before pooling.

There is something a bit off with the TF2.6 version where there’s the argument pooling, which could be 'avg' or 'max', but it doesn’t do anything because average pooling has already happened.

1 Like

You have a point there. That looks like an unintended change of behavior. Could you please file a bug with a quick repro in Colab + reference to the PR that caused this? Or even better, suggest fix as a PR?

https://github.com/keras-team/keras

1 Like