For tf.keras.applications.MobileNetV3 (large or small), there’s been a slight change to the architecture from TF <=2.5 to TF 2.6. Specifically the GlobalAveragePooling2D layer happens before “Conv_2” in TF2.6, but after “Conv_2” (and it’s non-linear activation) in TF2.5.

These operations don’t commute, so the architectures are slightly different. Both versions point to the same pre-trained weights, so their architectures ought to be the same.

I haven’t checked if this degrades the performance of the pretrained models.

My interest in this is mostly that it’s a breaking change to the API: MobileNetV3Large(include_top=False) will output a tensor of shape [?, 1, 1, 1280] starting with TF2.6 compared to a tensor of shape [?, 7, 7, 1280] with TF <=2.5 (assuming an input of shape [?, 224, 224, 3]).