TFX custom config argument in trainer not working

This question is based on the TFX recommender tutorial. Please note that the code is being orchestrated by LocalDagRunner rather than run interactively in a notebook.

In the Trainer, we pass in a custom_config with the transformed ratings and movies:

trainer = tfx.components.Trainer(
    module_file=os.path.abspath(_trainer_module_file),
    examples=ratings_transform.outputs['transformed_examples'],
    transform_graph=ratings_transform.outputs['transform_graph'],
    schema=ratings_transform.outputs['post_transform_schema'],
    train_args=tfx.proto.TrainArgs(num_steps=500),
    eval_args=tfx.proto.EvalArgs(num_steps=10),
    custom_config={
        'epochs':5,
        'movies':movies_transform.outputs['transformed_examples'],
        'movie_schema':movies_transform.outputs['post_transform_schema'],
        'ratings':ratings_transform.outputs['transformed_examples'],
        'ratings_schema':ratings_transform.outputs['post_transform_schema']
        })

The problem is that all of the outputs passed into custom_config seem to be empty. This results in errors, for example

class MovielensModel(tfrs.Model):

  def __init__(self, user_model, movie_model, tf_transform_output, movies_uri):
    super().__init__()
    self.movie_model: tf.keras.Model = movie_model
    self.user_model: tf.keras.Model = user_model

    movies_artifact = movies_uri.get()[0]

complains that movie_uri.get() is empty. The same is true for ratings. Ratings passed in through the examples parameter however are not empty (the artefact uri is available), so it seems as though this custom_config is ‘breaking things’.

I have tried debugging it but to no avail. I did notice that arguments in custom_config are serialised and deserialised, but this didn’t seem to be the cause of the problem. Does anyone know why this happens and how to resolve this?

That seems really odd. I’ve reached out to see if anyone has any ideas, but there are a couple of things you can try. You can add print statements (logging.info, etc) to inspect the outputs from the components in the custom config. And you can also set a breakpoint (maybe using pdb) and inspect. If they’re empty then it suggests that the trainer isn’t waiting for them to finish, and there’s a race condition. It will also help to insert print statements in those components to verify when they run.