MLP-Mixer with CIFAR-10

Here’s my implementation of MLP-Mixer, the all MLP architecture for computer vision without any use of convs and self-attention:

Here’s what is included:

  • Distributed training with mixed-precision.
  • Visualization of the token-mixing MLP weights.
  • A TensorBoard callback to keep track of the learned linear projections of the image patches.

Results are quite competitive with room for improvements for interpretability.