While both SkLearn and TF-DF implement the classical Random Forest algorithm, there is some few differences in between the implementations. For this reason, it is expected for the results (both the model structure and model quality) not to be exactly the same (but still very close).
Following are some parameter values that should make sklearn’s RandomForestClassifier as close as possible to TF-DF’s Random Forest.
PS: Random Forest and Gradient Boosted Trees are different algorithms.
I set everything just like the given code snippet. It’s intriguing, isn’t it?
The datasets I used for the 2 models were basically the same - all categorical data (text) was removed - The targets (ground truth) were mapped to positive integer index [0, 1, 2]. Basically, the ingredients for sklearn and TFDF are the same.
Notice that the dataset is very imbalanced, but the TFDF did a very impressive job. This is every cool but I don’t want be fooled by the metrics. I just wanna make sure the models work correctly. ^^
Just to clarify, the performance sklearn’s and tensorlfow’s random forests is the same. It was actually my fault in processing the data - I removed the most important feature out of the training data. In my case, the sklearn’s site works a little better. Have a nice day!