I have the following data reading utility:
def read_files(image_path, text_path):
image = tf.io.read_file(image_path)
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.resize(image, IMG_SZ, antialias=True)
text = tf.io.read_file(text_path)
text = tf.compat.as_str_any(text)
return image, text
This is how I am constructing the dataset:
dataset = tf.data.Dataset.zip((image_ds, text_ds))
dataset = dataset.map(read_files, num_parallel_calls=AUTO).cache()
Is there a way to interleave the read_files()
function?
Do you mean: tf.data.Dataset.zip((image_ds.map(read_img), text_ds.map(read_text)))
?
No I meant using the actual interleave()
method. I guess it’s probably not doable in this case since the map_func
passed to interleave()
is supposed to return a Dataset
?
Why? what problem are you trying to solve?
Did you know that you can pass datasets through datasets?
datasets = [a, b, c,]
meta = tf.data.Dataset.from_tensor_slices(datasets)
merged = meta.interleave(lambda x:x)
If you did that with your image_ds, text_ds
then you’d have a dataset that alternates between image and text paths… but why? A dataset needs to have a single spec. So you can’t load the files in each dataset, and then interleave those.
I can think of a few solutions, they just seem worse than the map before zip
above:
.batch(2).map(read_files)
- have both
read_img
and read_text
return (image, text)
pairs, where the first axis of text
is length 0 for read_image
and the first axis of image
is zero for read_text
.
I wanted to if it’s possible to interleave the reading of files. I am probably conceptually mistaken. I hope that’s not blasphemy.
So in your opinion, tf.data.Dataset.zip((image_ds.map(read_img), text_ds.map(read_text)))
would be more efficient than what I am already doing?
interleave the reading of files
You mean run the two reads in parallel?
I think even in your initial implementation TensorFlow is supposed to notice that the image and text branch are independent, and execute them in parallel.
I see. If that’s the case, then all’s well I guess.