Balancing Dominant Feature Importances

shayan_sadeghieh · May 12, 2022, 1:12pm

Hi there,

Is there an equivalent to xgboost’s colsample_by* parameter? The idea behind xgboost’s colsample_by* parameter is to specify the fraction of feature columns to be subsampled at the tree, level and node.

I find my tfdf gradient boosted tree models become obsessed with certain features, and was wondering if there was a way to balance out the importances? Although the performance is good on test data, I am attempting to reduce the risk of one of the features going wrong in production and severely impacting my predictions.

Below is the way I am currently calculating importances. Perhaps I am doing something wrong here:

for feature, imp_score in model.make_inspector().variable_importances()["SUM_SCORE"]:
            feature_importances[feature[0]] = imp_score

Thank you!