Here are some links I have found to be helpful in this regard:
Getting started notebook
Tensorflow transform can actually even take in keras preprocessing layers, which certain caveats. It uses apache beam to scale the pipeline and does a lot to help with pipeline reproducibility.
As far as keras, here are some useful links
Good starting point
The scikit-learn comparison is especially interesting as the design choices of an all in-memory approach vs. a streaming approach become quite apparent. They do have a lot of commonalities such as the goal for using the same pipeline for training the data as is used at prediction time. The definition of pipeline itself is quite overloaded in the tensorflow ecosystem. For example, a tfx pipeline and a tft pipeline, how do they differ and what is their relationship with each other is an interesting point. For example, if I remember correctly, column_selector in scikit-learn can be directly integrated into the scikit learn pipeline, whereas in tfx, tensorflow dava validation handles inferring the schema, and tft uses the schema and enriches that, and other artifacts, for use downstream. As such, tfx feels much more de-coupled, but necessarily more complex and powerful with a steeper learning curve.
Hopefully this is enough to get you started, let me know if you need any further information.