Access to variable importances other than NUM_AS_ROOT

I am evaluating TensorFlow Decision Forests code as a replacement for existing code that usesTensorFlow’s BoostedTreesClassifier.

I am running TFDF v0.1.7 on Ubuntu Linux 20.04.2 LTS.

Using the example code in this tutorial as a starting point:

I have been able to train, evaluate, obtain a summary of the model, and save the model successfully using my own data. I have been using tfdf.keras.RandomForestModel() as a starting point.

My problem is getting programmatic access to the different types of variable importances. My understanding is that I should be able to do this by obtaining an inspector from the trained model.

   inspector = tfdf_model.make_inspector()
   variable_importances = inspector.variable_importances()

The return value of variable_importances() is a dict as expected, but when I invoke keys(), the only key returned is NUM_AS_ROOT. Judging from what I have seen in the output from summary(), I would have expected NUM_NODES, SUM_SCORE, and MEAN_MIN_DEPTH` to be present as well.

Is there something I need to specify to make those statistics accessible from variable_importances()?

Thanks!


P.J. Hinton

hi P.J.

You are doing exactly right.

Some metrics are more expensive to be calculated, need to be enabled by passing compute_oob_variable_importances=True. See list of flags here.
Please let me know if this helps.

But odd indeed, when I turn off compute_oob_variable_importances (I usually have it enabled), I do get the following keys: ['NUM_AS_ROOT', 'MEAN_MIN_DEPTH', 'NUM_NODES', 'SUM_SCORE'].

Are you doing regression or classification btw ?

1 Like

Thanks for the reply. I will give that a try!

I am doing classification.

Following on the thread.

Background

  • OOB type variable importances are computed iff. compute_oob_variable_importances=True during training.
  • Prior to TF-DF 0.1.8, structural variable importances (e.g. NUM_AS_ROOT, MEAN_MIN_DEPTH, NUM_NODES, SUM_SCORE) were computed by the python model inspector (i.e. when looking at the model).
  • Starting with TF-DF 0.1.8 (released on Jul 29, 2021), structural variable importances are computed in c++ at the end of the model training.

I suspect you are using TF-DF <0.1.8. Make sure to update it and to re-train the model (as the feature importances are now part of the TF-DF model).