Make a spell-check model with Rnn

Hello, I am recently working on NLP seq2seq model made with Rnn to predict right spell sequence of letters from a mistaken sequence.
Basically, I’m tokenizing the words in char level and the input is a single word.
The model architecture is from seq2seq tensorflow tutorial.
The largest sequence length is around 17.

Here the problem is if there are total word is around 2000 and I show the model each words more than 50 times during training. The model then works fine as I’m expecting. But more than 2000 words to train it badly fails and can’t generate a single right words no matter how much times i show each words. It just randomly generate token untill the maximum length.

I’m embedding with 256 dims. And in the encoder and decoder 1024 units of gru Rnn was used.

So, can you please prefer or suggest me a better architecture capable to memorize more and more type of sequence and the total model size will be significantly small and efficient.