I asked this question previously in the Keras google group and they send me over (find the orignal post here: https://groups.google.com/g/keras-users/c/RW22q3ywxhE )
I wanted to use the implementation of the Compact Convolutional Transformer, as mentioned in the Keras documentation (Compact Convolutional Transformers).
So to test the implementation I set up a little notebook in google colab and start playing a bit with it. Now the problem is, if I create a complete new model and use the
load_weights() functions to load the weights from the last checkpoint,
the accuracy on the test set is different each time, I load a new model.
You can find the colab-notebook here:
The cell under “Test a completely new created CCT with the pre-trained weights”
you can find a little for-loop which 3 times
created a new CCT network and loaded the pre-trained weights.
As you can see, the accuracy is slightly different each time.
In the cell above you can see, how the original model (the one which is trained) preforms 3 times exactly the same.
Through the discussion on the google group, the data_augmentation layer could be already identified as one source for some randomness (but it is not clear for me, why it only happens if a complete new CCT network is created?).
Further, as you can see under “Get the activity of the 2D-Conv. subnetworks”, I created a subnetwork, consisting only of the input-layer and the 2D-Conv. subnetworks, created out of a complete CCT network with the pre-trained weights. This subnetwork shows always the same output for the same input.
If the subnetwork is extended by the next layer of the CCT, the outputs are differ for the same input.
As you can see, in the end, the results are not that much different, but I want to understand, what can causes such behavior.
Thank you for your time and your help!