Is there an equivalent to xgboost’s
colsample_by* parameter? The idea behind xgboost’s
colsample_by* parameter is to specify the fraction of feature columns to be subsampled at the tree, level and node.
I find my tfdf gradient boosted tree models become obsessed with certain features, and was wondering if there was a way to balance out the importances? Although the performance is good on test data, I am attempting to reduce the risk of one of the features going wrong in production and severely impacting my predictions.
Below is the way I am currently calculating importances. Perhaps I am doing something wrong here:
for feature, imp_score in model.make_inspector().variable_importances()["SUM_SCORE"]: feature_importances[feature] = imp_score