Are RandomForestModels deterministic across different TFDF versions?

I’ve trained a few RandomForest models with TFDF 1.0.1, 1.4 and 1.5. They are deployed using tensorflow serving + vertexAI. I appear to be getting slightly different prediction results for models trained on the exact same data. According to the docs, the random_seed is set to 123456, so assuming that has always been the default argument, all models should be deterministic. Are there any reasons I might expect differing prediction values across TFDF versions?

See my other reply below

Hi,

~this is indeed unexpected. There is one training parameter that explicitly makes the training non-deterministic, namely maximum_training_duration_seconds. Can you confirm that this parameter was not modified?

1 Like

Hi, I just thought about this again and I think I’ve misread your question initially – sorry about that.

Between two TF-DF versions, trained models can differ the same way changing the random seed will change models. In other words:

  • If the model is fixed the predictions are the same between versions
  • If the model is retrained on the same data with different TF-DF versions, predictions might change slightly. However, the semantic of the hyper-parameters stays the same.

No worries! I meant to do some further testing but haven’t got round to it. Just so we’re clear, these were the parameters I used to train the RF:

        rf = tfdf.keras.RandomForestModel(
            exclude_non_specified_features=False,
            verbose=2,
            hyperparameter_template="better_default",
            task=tfdf.keras.Task.REGRESSION,
            num_trees=num_trees,
            compute_oob_variable_importances=True,
            features=feats,
            name="first_random_forest",
            tuner=None
        )

So if I used these params to train 2 different TFDF models, everything else exactly the same except the TFDF version, does this allude to your second point that predictions might change slightly?

Yes, that might happen.

The change will only be small, because this is equivalent to changing the random seed between two runs.

Great, thanks for the clarification!