Feature selection and tree size in TF-DF

Question on feature selection:

I was looking at figure 6 of

It mentioned feature subset selection at say 95% of cumulative loss reduction.

This was a common question from leads in launch reviews using Ranklab models as well. Like … “Can a much smaller number of features get almost as much accuracy as the full model?”

Curious to hear your thoughts on this and the support TFDF might have for this?

Or if you feel there is an established approach in the industry for this that we can build on top of TFDF that would be useful as well.

There is another dimension to this question, about the choice of number of trees in the model. For instance, section 7 of the aforementioned paper claims:

We have presented a tradeoff between the number of boosted decision trees and accuracy. It is advantageous to keep the number of trees small to keep computation and memory contained.

Curious to hear your thoughts about this as well.