So, I have been exploring NLP of recent.
There’s this one project that I have been planning to do for a while, but each time I put it on the sidelines.
It’s translating English to Chinese and Chinese to English.
The biggest challenge am encountering is preprocessing Chinese characters. I have been looking around, but unable to find any information.
How do I preprocess and tokenize Chinese and similar languages characters?
We had a thread at:
1 Like