Different behavior of tf.keras.layers.experimental.preprocessing.HashedCrossing


I’m using the layer above in the context of tfx. When I build a tfx pipeline and using a Transform component with the layer mentioned above included in preprocessing_fn, ipython crashes due to out of memory. When I run the same preprocessing_fn without using the Transform component and calling beam directly, I see correct behavior. This occurs when I’m using a thousand buckets. When I reduce it to a hundred buckets, the behavior is as expected with both methods. I have a few questions:

  1. Has anyone seen this before?
  2. Why does Transform execute differently through Local Dag Runner when compared with AnalyzeAndTransformDataset in the context of a beam pipeline?

Any guidance is appreciated. Thank you!