What is the best approach to search for the optimal hyper parameters when training a large model and using a big dataset?
What to do in order to find those hyper parameters in a reasonable time?
Should we go for a small chunk of the data? If so, any “ideal” percentage? How should this data be selected? And how much confident would we be about the performance of the model trained on the whole data as compared with the one that achieved best results on a chunk of the data?