Since you asked about it, here is a couple of nits that might be interesting to try:
Installing TF-DF (i.e. pip install tensorflow_decision_forests) prints a lot of things. You can mask some of it as follow:
!pip install tensorflow_decision_forests -U -q
By default, Colabs are running on two small CPUs (trying running !cat /proc/cpuinfo). However, by default, TF-DF trains on 6 threads (see “num_threads” constructor argument). It would be interesting to see the speed of training with only 2 threads.
The two approaches differ in two ways: Different l1_regularization values and the replacement of missing values by the mean in the second approach. Apart from this, both approaches are equivalent and are expected to give similar results within training noise (which might already be the case 0.81345 ~= 0.81343).
You can compute the confidence bounds or a t-test to be fancy :).
For long training, it might be interesting to print the training logs (while training). This can be done as follow:
!pip install wurlitzer -U -q
from wurlitzer import sys_pipes
Note that at some point, this will be done automatically depending on the verbose parameter.
Thanks for sharing the colab. Since you had hands-on practice with the library, do you mind me asking you about your experience? For example, did you face some hard debug errors, or did some of the behavior of the library was surprising?