Implementing Fastformer: Additive Attention Can Be All You Need

I am glad to present my implementation of the “Fastformer: Additive Attention Can Be All You Need” paper.

This is a Transformer variant based on additive attention that can handle long sequences efficiently with linear complexity. Fastformer is much more efficient than many existing Transformer models and can meanwhile achieve comparable or even better long text modeling performance.

6 Likes

Nice work!

If you have the trained model, maybe you could also publish it on Tensorflow Hub

3 Likes

Thanks, @lgusm
I haven’t yet worked on training it but this is a really great idea, let me get started on this as soon as possible.

2 Likes