Language translation

So, I have been exploring NLP of recent.
There’s this one project that I have been planning to do for a while, but each time I put it on the sidelines.
It’s translating English to Chinese and Chinese to English.
The biggest challenge am encountering is preprocessing Chinese characters. I have been looking around, but unable to find any information.
How do I preprocess and tokenize Chinese and similar languages characters?

We had a thread at:

1 Like