Different behavior of tf.keras.layers.experimental.preprocessing.HashedCrossing

pritamdodeja · January 23, 2023, 12:06am

Hello,

I’m using the layer above in the context of tfx. When I build a tfx pipeline and using a Transform component with the layer mentioned above included in preprocessing_fn, ipython crashes due to out of memory. When I run the same preprocessing_fn without using the Transform component and calling beam directly, I see correct behavior. This occurs when I’m using a thousand buckets. When I reduce it to a hundred buckets, the behavior is as expected with both methods. I have a few questions:

Has anyone seen this before?
Why does Transform execute differently through Local Dag Runner when compared with AnalyzeAndTransformDataset in the context of a beam pipeline?

Any guidance is appreciated. Thank you!

-Pritam