Hello to the TFX community,
I am looking for fellow TFX users who are running their pipelines using the FlinkRunner to run Beam Components.
I would like to share experience, the main interrogation I have today would be: “Should I use Flink in stream or batch mode to execute TFX Beam components such as StatisticGen ?”
I understand the nature of a TFX Pipeline is more suitable for batch. But on the other hand, using Flink in “stream mode” unlock different capabilities such as task restart or checkpoint and Flink seems to be globally better behaving when running streaming jobs.
Beam supports generating both batch and stream Flink job.
Anyone has some arguments for one way or the other ?
Thank you !