Num_candidate_attributes

LinaTeck · November 9, 2021, 4:41am

Hi everyone!

I’m relatively new to tensorflow. I’m trying to use the tfdf.keras.RandomForestModel (regression). I just reproduced the example for the regression from here

https://tensorflow.google.cn/decision_forests/tutorials/beginner_colab#training_a_regression_model

I was playing with the num_candidate_attributes and I guess I was expecting that during the evaluation of the test dataset the MSE and RMSE dataset would be sensitive to that parameter, but they stay exactly the same no matter what value I give it. I saw some publications that used random forest in R where the choice of mtry (which I’m assuming is the equivalent to num_candidate_attributes?) had an impact on the error metric chosen to evaluate the test dataset. I’m just wondering whether I need to change more attributes to actually change that value or whether I’d just expect it doesn’t change anything (and if so - why?)

Thanks a lot!

Mathieu · November 9, 2021, 9:33am

Long story short, you found a bug :).
Thanks for sharing the issue. The bug has been resolved, and the fix will be published in the next release of TF-DF.

In the meantime, when you specify “num_candidate_attributes”, make sure to also set “num_candidate_attributes_ratio=None”. For example:

model = tfdf.keras.RandomForestModel(
    num_candidate_attributes=5,
    num_candidate_attributes_ratio=None
    )

Yes, ‘num_candidate_attributes’ exactly corresponds to ‘mtry’ in R’s Random Forest libraries.

Details about the bug

In TF-DF, you can specify “num_candidate_attributes” or “num_candidate_attributes_ratio”. The bug is that “num_candidate_attributes_ratio” overrides “num_candidate_attributes” unless “num_candidate_attributes_ratio=None” (it should be “num_candidate_attributes_ratio=None” or “num_candidate_attributes_ratio=-1”). Unfortunately, “num_candidate_attributes_ratio=-1” by default.

LinaTeck · November 9, 2021, 9:21pm

Great, thanks a lot!