I am comparing a tfdf RandomForestModel, specifically for a regression task, to the performance of an sklearn RandomForestRegressor model. The published hyperparameters for the sklearn model are:

- max_features=6
- n_estimators=50
- max_depth=None
- min_samples_split=2

I am not getting similar performance for the two models.

My constructors are the following and fit calls are the following:

sk_rf_model = RandomForestRegressor(max_features=6, n_estimators=50, max_depth=None, min_samples_split=2)

sk_rf_model.fit(X_npy, y_npy, sample_weight=train_data[‘sample_weight’])

RMSE: 0.01954

MAE: 0.0059

tfdf_rf_model = tfdf.keras.RandomForestModel(num_trees=50, verbose=2, num_candidate_attributes=6, min_examples=2, max_depth=None, task=tfdf.keras.Task.REGRESSION, num_threads=1)

tfdf_rf_model.model_1.fit(x=X_time_space_npy, y=y_npy, sample_weight=train_data[‘sample_weight’].to_numpy())

RMSE: 0.02304

MAE: 0.0088

I set the num_threads to 1 to compare single-threaded to single-threaded behavior, that does not alleviate the difference.

The previously published model I am comparing uses the noted RF hyperparameters.

Sklean RandomforestRegressor Documentation Hyperparameters:

**max_features** {“sqrt”, “log2”, None}, int or float, default=1.0

The number of features to consider when looking for the best split:

- If int, then consider
`max_features`

features at each split. - If float, then
`max_features`

is a fraction and`max(1, int(max_features * n_features_in_))`

features are considered at each split. - If “sqrt”, then
`max_features=sqrt(n_features)`

. - If “log2”, then
`max_features=log2(n_features)`

. - If None or 1.0, then
`max_features=n_features`

.

**min_samples_split** int or float, default=2

The minimum number of samples required to split an internal node:

- If int, then consider
`min_samples_split`

as the minimum number. - If float, then
`min_samples_split`

is a fraction and`ceil(min_samples_split * n_samples)`

are the minimum number of samples for each split.

Tensorflow Decisionforest Documentation Hyperparameters:

** num_candidate_attributes** Number of unique valid attributes tested for each node. An attribute is valid if it has at least a valid split. If

`num_candidate_attributes=0`

, the value is set to the classical default value for Random Forest: `sqrt(number of input attributes)`

in case of classification and `number_of_input_attributes / 3`

in case of regression. If `num_candidate_attributes=-1`

, all the attributes are tested. Default: 0.** min_examples** Minimum number of examples in a node. Default: 5.