Tfdf and nn model ensemble

ahad · July 1, 2023, 7:35pm

Hi,
I’m trying to use a similar architecture shown in the tfdf tutorial: Composing Decision Forest and Neural Network models | TensorFlow Decision Forests

The training in this example is not clear to me, I see there are 3 different fit calls for:
ensemble_nn_only
model_3
model_4

I understand that this is useful for the breakdown in the evaluation, however it’s a bit not straight forward.

1- Why ensemble_nn_and_df wasn’t trained?

2- How come I can evaluate ensemble_nn_and_df which was not trained as a hole?

3- I trained only ensemble_nn_and_df instead of training all its components, and the accuracy was drooped comparing to training separately as you showed, what is the reason behind that?

4- In this example you mentioned fine-tune step but no code example were given, can you please elaborate how can one fine tune?

I would appreciate any help or clarification!
Regards

Mathieu · July 3, 2023, 8:02am

Hi Ahad,

The reason TF-DF (TensorFlow Decision Forests) feels strange, and the core answer to your questions, is that the decision forests training algorithms are fundamentally different from the training algorithm for neural networks.

The main difference is that DF do not consume gradients as input during training, and they do not propagate gradients from their output to their inputs. This document gives more details. In practice, this is not entirely true as there are research papers with possible solutions, but this is outside of the scope of classical DF training.

Back to your questions.

Why ensemble_nn_and_df wasn’t trained?

The four sub components (model1-4) are trained. ensemble_nn_and_df is simply a concatenation of model1-4, so ensemble_nn_and_df does not have any “non-trained” parameters.

How come I can evaluate ensemble_nn_and_df which was not trained as a hole?

Same reason as 1.

I trained only ensemble_nn_and_df instead of training all its components, and the accuracy was drooped comparing to training separately as you showed, what is the reason behind that?

TF-DF models can only be trained by calling “.fit” method on the model itself.
When calling ensemble_nn_and_df.fit, it does not call fit on the sub-models. Therefore, only the NN are trained.

Before being trained, a neural network returns “garbage” random values that depend randomly on the value of the input features. For a tf-df model, the situation is different: a non trained TF-DF model always return “0”. In other words, if you only call fit on ensemble_nn_and_df, the ensemble_nn_and_df effectively only contains the neural networks.

In this example you mentioned fine-tune step but no code example were given, can you please elaborate how can one fine tune?

Because of the back-propagation limitation, a DF cannot be used to finetune NN located before it. See paragraph “For this reasons, the classical RF algorithm cannot be used to train or fine-tune a neural network underneath…”.

However, if you have a NN and a DF in parallel (like in this tutorial), you can back-propagate through the NN to train the “learnable NN preprocessing”. See paragraph “In practice, such a preprocessing layer could either be a pre-trained embedding to fine-tune, or a randomly initialized neural network.”. This is exactly what is done in this tutorial (see the training of “preprocessor”)

If you were to replace model1 and 2 with pre-trained trainable neural networks, the training of “ensemble_nn_only” would be a fine-tuning.

I hope this helps,
Mathieu

ahad · July 24, 2023, 1:22pm

Hi Mathieu,
That you for the elaborated explanation, it is much clearer now.
I implemented the ensemble for my data by training tfdf model, and load it as a layer to a class that extends Model.
It works fine, I wonder how can I get the leafs instead of the output from the trained tfdf model (without using the predict api I need this implementation inside the tf.function call that I override).

Thanks for your help!