Training TFDF with TFX

Hi all,

I am trying to set up a GB classifier on TFX for the first time. I am able to run smoothly with the interactive context. However when trying to run it on Kubeflow on GCP, I am having an issue with the transform component. Here are a few questions, which could hopefully solve the problems I am encountering:

  1. What is the recommended environment and/or VM to set up TFDF with TFX?
  2. When creating the pipeline on Kubeflow (with CLI: tfx pipeline create […]), wheels are being created for the transform and trainer component in a /tmp folder in (such as tfx_user_code_Transform-[…]). What purpose are those wheels serving and how to indicate where to store/retrieve them? I dont exactly know why and my set up might be wrong but the Transform component, when ran on Kubeflow, is looking for those wheels in the wrong location ( ings://[…]/[…]/_wheels/[…]). More context is given here
  3. Is there any code snippet or architecture code that we could follow to set up a TFDF model on TFX. Or a estimator.GradientBoostedClassifier that I can’t seem to be able to set up properly.

Any help would be greatly appreciated :slight_smile:
Armand

We have an example of TFDF here:

Hi @Robert_Crowe thanks for posting this. Lines 34- 36 are particularly interesting:

flags.DEFINE_enum('model_framework', 'keras',
                  ['keras', 'flax_experimental', 'tfdf_experimental'],
                  'The modeling framework.')

I’m not familiar with how flags influence TFX - does specifying tfdf_experimental signal something special to either Google AI platform or the VertexAI platform when the pipeline is executed there?

Nope, it’s just command line flags. If you search through the code you can see how it influences the name of the pipeline and which module file is selected.

2 Likes

Hi @Robert_Crowe, thanks for your answer.

A couple of follow up questions:

  1. Are you working on a Linux machine to be able to locally develop tfdf models on tfx?
  2. Why is this called tfdf_experimental? I saw that it was difficult to integrate such a model with TFS. Is it still the case?

Thanks again!

@Robert_Crowe just bumping up this in case you have some information on this. Is there any example of using TFDF on a Kubeflow pipeline? Thanks a lot!

@armandsauzay - I haven’t tried building the TFDF model introduced in TFX 1.6.0 to the Penguin example (tfx/tfx/examples/penguin at master · tensorflow/tfx · GitHub) to Kubeflow but it worked locally for me:

python penguin_pipeline_local.py --model_framework=tfdf_experimental

Presumably, switching to using penguin_pipeline_kubeflow.py should build the TFDF model to Kubeflow.