What is the good way to access post-transform-statistics in Trainer (TFX)

Hi, all.
I have a question about TFX’s Trainer component.

How can I access post-transform-statistics(output of Transform component) in the Trainer’s run_fn?

I checked that post-transform-statistics is emitted by Transform, but standard_artifacts.ExampleStatistics cannot be passed to Trainer

I know I can parse version in the path and join paths for the post-transform-statistics using fn_args.transform_graph_path that is the argument of Trainer’s run_fn, but I don’t think that is a good option.

Thanks :slight_smile:

Have you tried using a custom_config argument for Trainer? Something like:

trainer = tfx.components.Trainer(
    ...
    custom_config={
        'tft_stats':transform.outputs['post_transform_stats']
        }
)
2 Likes

I passed post_transform_stats via custom_config and tried to get artifacts like below in run_fn,

def run_fn(fn_args: FnArgs):
    """Callback function for Trainer component"""
    logging.info(fn_args.custom_config['tft_post_stats'])
    logging.info(fn_args.custom_config['tft_post_stats'].get())
    ...

and I couldn’t get any artifacts.

[2022-01-27 05:43:40,613][root][INFO] - Channel(
    type_name: ExampleStatistics
    artifacts: []
    additional_properties: {}
    additional_custom_properties: {}
)
[2022-01-27 05:43:40,613][root][INFO] - []

Maybe that’s because I tried to pass Channel via execution property.

I think I should try another ways. (mlmd or some other way)

But thanks for your reply! @Robert_Crowe :slight_smile:

Try using the artifact explorer after your Transform component to explore what the post transform statistics look like:


1 Like

Hey @jeongukjae, did you manage to do what you want ? If yes how did you choose to do it ?

I am trying to do something similar, I want to load past tuner output with a resolver and pass the resolver output as custom_config of my tuner component, so I can access it in the tuner_fn.

When I run components in an interactive environment, I manage to make it work. The resolver loads the old tuner output and I can access it in my tuner_fn.

But when I run the same pipeline twice (same metadata db and same pipeline_root) with the LocalDagRunner or with the BeamDagRunner the Channel is empty.

I am unaware of how the interactive context behave and how different it is from Dag runners.

Thank you in advance.

Hi @Jean-Christophe_Carl!

As far as I understand, we cannot pass channel objects via custom_config.
Custom config is the execution parameter, and resolver output is a channel. (tuner component spec)

Channel object contains artifacts, and those artifacts are resolved from MLMD in the Driver in advance of the executor class. (Implementation link)
Since the execution parameter is serialized when we construct pipelines, so we cannot pass resolver output (channel object) in custom_config(execution parameter), and we can pass resolver output in an interactive context (the executions of previous components are done, so artifacts are assigned in channel’s attribute).

I solved this problem via customizing the trainer component.

I’m not a TFX developer, so the information above may be wrong.

Thanks :slight_smile:

Thank you for your reply, I see indeed the difference in the driver behavior in case of interactive context. I didn’t get this far.

We end up having a custom trainer executor which patch the model artifact with an additional property.