Implementing Compositional Attention

Here is my TF/Keras implementation of the recent Compositional Attention paper by MILA which disentangles the search and retrieval components of the attention mechanism. This can be used as a drop-in replacement for standard multi-head attention and outperforms it for some tasks.


Nice work! Congrats!