Clarifications on parsing arguments in a TFX pipeline

Sayak_Paul · July 23, 2021, 5:38am

I have been building my muscle memory for MLOps and the progress has been good so far. Thanks to Coursera’s Specialization, ML Design Patterns book, and Vertex AI’s neat examples.

I wanted to build a simple Vertex AI pipeline that should train a custom model and deploy it. TFX pipelines seemed like a way easier choice for this than KFP pipelines.

I am now referring to this stock example:

I see loads of argument parsing here and there, especially in the model building utilities. For reference, here’s a snippet that creates ExampleGen and Trainer in the initial TFX pipeline:

# Brings data into the pipeline.
example_gen = tfx.components.CsvExampleGen(input_base=data_root)

# Uses user-provided Python function that trains a model.
trainer = tfx.components.Trainer(
    module_file=module_file,
    examples=example_gen.outputs['examples'],
    train_args=tfx.proto.TrainArgs(num_steps=100),
    eval_args=tfx.proto.EvalArgs(num_steps=5))

The run_fn only takes fn_args as its arguments. I am wondering how the arguments passed and mapped inside penguin_trainer.py?

I will be grateful for an elaborate answer.

lgusm · July 23, 2021, 5:07pm

I guess @Robert_Crowe might be able to help here

Robert_Crowe · July 26, 2021, 4:45pm

Take a look at these lines in the Trainer component for how the run_fn is invoked. The private function _GetFnArgs is used to gather and create the fn_args which get passed through Trainer to the run_fn.

Sayak_Paul · July 26, 2021, 4:50pm

Thanks Robert! Appreciate it. Maybe a brief note about this in the tutorial would be helpful for curious readers.