Text to speech data set help

I am working on my FYP to build a text to speech in Urdu language. i currently building my data set. i want to know which audio features is best for machine learning. i try MFCC but when i reconstruct its audio the result is very bad.
I know how to do it for one feature. but i want to use more features in my data set but i do not know how to use two or more features output and convert it into audio.

Means the how output audio features will reconstruct into audio file.
any python library?

