Tfx Tranform census notebook question

Jerome_MASSOT · October 31, 2023, 11:02pm

Hi everyone,

I have the following question regarding the comment made in this code about the categorical columns.

# For all categorical columns except the label column, we generate a
    # vocabulary but do not modify the feature.  This vocabulary is instead
    # used in the trainer, by means of a feature column, to convert the feature
    # from a string to an integer id.
    for key in CATEGORICAL_FEATURE_KEYS:
        outputs[key] = tft.compute_and_apply_vocabulary(
            tf.strings.strip(inputs[key]),
            num_oov_buckets=NUM_OOV_BUCKETS,
            vocab_filename=key
        )

I do not understand why the comment says that the features are not modified. The vocabulary is created ok, but it seems to me that the outputs[‘key’] is modified, corresponding to a modification of the feature.

Am I missing something here?

Thanks

Best regards

Jerome