How to train transformer to care about word order in language translation?

I want the model to focus on the order in which the output sequence is generated. For instance, if the model’s output is a b a and the target is a a b, the model predicted only the first element a in the right position. Is there a way to make tranformers care about the order as well?

For smaller sentences LSTM's are very good at inferring local relationships with the two or three words that immediately precede it. For longer sentences please use FLOw-bAsed TransformER(FLOATER). For more details please refer here. Thank you.