What is the good way to access post-transform-statistics in Trainer (TFX)

jeongukjae · January 25, 2022, 10:33am

Hi, all.
I have a question about TFX’s Trainer component.

How can I access post-transform-statistics(output of Transform component) in the Trainer’s run_fn?

I checked that post-transform-statistics is emitted by Transform, but standard_artifacts.ExampleStatistics cannot be passed to Trainer

tfx/standard_component_specs.py at v1.5.0 · tensorflow/tfx · GitHub (TransformSpec)
tfx/standard_component_specs.py at v1.5.0 · tensorflow/tfx · GitHub (TrainerSpec)

I know I can parse version in the path and join paths for the post-transform-statistics using fn_args.transform_graph_path that is the argument of Trainer’s run_fn, but I don’t think that is a good option.

Thanks

Robert_Crowe · January 25, 2022, 10:25pm

Have you tried using a custom_config argument for Trainer? Something like:

trainer = tfx.components.Trainer(
    ...
    custom_config={
        'tft_stats':transform.outputs['post_transform_stats']
        }
)

jeongukjae · January 27, 2022, 5:57am

I passed post_transform_stats via custom_config and tried to get artifacts like below in run_fn,

def run_fn(fn_args: FnArgs):
    """Callback function for Trainer component"""
    logging.info(fn_args.custom_config['tft_post_stats'])
    logging.info(fn_args.custom_config['tft_post_stats'].get())
    ...

and I couldn’t get any artifacts.

[2022-01-27 05:43:40,613][root][INFO] - Channel(
    type_name: ExampleStatistics
    artifacts: []
    additional_properties: {}
    additional_custom_properties: {}
)
[2022-01-27 05:43:40,613][root][INFO] - []

Maybe that’s because I tried to pass Channel via execution property.

I think I should try another ways. (mlmd or some other way)

But thanks for your reply! @Robert_Crowe

Robert_Crowe · January 31, 2022, 8:04pm

Try using the artifact explorer after your Transform component to explore what the post transform statistics look like:

Jean-Christophe_Carl · March 25, 2022, 2:50am

Hey @jeongukjae, did you manage to do what you want ? If yes how did you choose to do it ?

I am trying to do something similar, I want to load past tuner output with a resolver and pass the resolver output as custom_config of my tuner component, so I can access it in the tuner_fn.

When I run components in an interactive environment, I manage to make it work. The resolver loads the old tuner output and I can access it in my tuner_fn.

But when I run the same pipeline twice (same metadata db and same pipeline_root) with the LocalDagRunner or with the BeamDagRunner the Channel is empty.

I am unaware of how the interactive context behave and how different it is from Dag runners.

Thank you in advance.

jeongukjae · March 25, 2022, 3:54am

Hi @Jean-Christophe_Carl!

As far as I understand, we cannot pass channel objects via custom_config.
Custom config is the execution parameter, and resolver output is a channel. (tuner component spec)

Channel object contains artifacts, and those artifacts are resolved from MLMD in the Driver in advance of the executor class. (Implementation link)
Since the execution parameter is serialized when we construct pipelines, so we cannot pass resolver output (channel object) in custom_config(execution parameter), and we can pass resolver output in an interactive context (the executions of previous components are done, so artifacts are assigned in channel’s attribute).

I solved this problem via customizing the trainer component.

I’m not a TFX developer, so the information above may be wrong.

Thanks

Jean-Christophe_Carl · March 25, 2022, 9:15am

Thank you for your reply, I see indeed the difference in the driver behavior in case of interactive context. I didn’t get this far.

We end up having a custom trainer executor which patch the model artifact with an additional property.