Hi,

I want to use `tf.data.Dataset`

as main building block in my data pipeline for training a neural network with Tensorflow that deals with time series. Ideally without resorting to custom dataloader classes.

**Question:** How do you perform processing that requires more than what Tensorflow can express? I.e. operations that require for example Numpy input and therefore cannot be integrated in the Tensorflow graph.

**Example:** Given time series data, I would like to resample the data to be able to use time series data from different sources in a single training dataset. How can that be achieved?

The reasoning behind the pipeline-integrated transformations are that those transformations only take around 10min on the whole dataset that I use. Hence, I am happy to perform them prior to training instead of deriving a dedicated dataset once.

I am aware of similar questions here (like this). Also, I am aware of Tensorflow Transform and Keras preprocessing layers. None of those options allow for example interpolation. There exists a TF implementation for interpolation - but that only works on an equidistant grid unfortunately. An interesting implementation of interpolation in TensorFlow is this one; however, I would much prefer to existing implementations in SciPy or NumPy.

What is your workflow to implement preprocessing steps that are easy with NumPy and alike if one-time performance is not crucial? Maybe using a custom dataloader is in fact easier than relying on `tf.data.Dataset`

for those preprocessing steps?

Thanks and best wishes!