Are RandomForestModels deterministic across different TFDF versions?

Matt_Miller · August 10, 2023, 1:41pm

I’ve trained a few RandomForest models with TFDF 1.0.1, 1.4 and 1.5. They are deployed using tensorflow serving + vertexAI. I appear to be getting slightly different prediction results for models trained on the exact same data. According to the docs, the random_seed is set to 123456, so assuming that has always been the default argument, all models should be deterministic. Are there any reasons I might expect differing prediction values across TFDF versions?

rstz · August 10, 2023, 3:49pm

See my other reply below

Hi,

~this is indeed unexpected. There is one training parameter that explicitly makes the training non-deterministic, namely maximum_training_duration_seconds. Can you confirm that this parameter was not modified?

rstz · August 22, 2023, 2:26pm

Hi, I just thought about this again and I think I’ve misread your question initially – sorry about that.

Between two TF-DF versions, trained models can differ the same way changing the random seed will change models. In other words:

If the model is fixed the predictions are the same between versions
If the model is retrained on the same data with different TF-DF versions, predictions might change slightly. However, the semantic of the hyper-parameters stays the same.

Matt_Miller · August 23, 2023, 9:41am

No worries! I meant to do some further testing but haven’t got round to it. Just so we’re clear, these were the parameters I used to train the RF:

        rf = tfdf.keras.RandomForestModel(
            exclude_non_specified_features=False,
            verbose=2,
            hyperparameter_template="better_default",
            task=tfdf.keras.Task.REGRESSION,
            num_trees=num_trees,
            compute_oob_variable_importances=True,
            features=feats,
            name="first_random_forest",
            tuner=None
        )

So if I used these params to train 2 different TFDF models, everything else exactly the same except the TFDF version, does this allude to your second point that predictions might change slightly?

rstz · August 29, 2023, 2:56pm

Yes, that might happen.

The change will only be small, because this is equivalent to changing the random seed between two runs.

Matt_Miller · August 29, 2023, 4:40pm

Great, thanks for the clarification!