Define output signature for TFRecordDataset without using Dataset.from_generator

Hi,
Is it possible to define an output signature for a TFRecordDataset? Currently, I’m using the snippet below:

raw_dataset = tf.data.TFRecordDataset(
        tf_files, compression_type="GZIP", num_parallel_reads=tf.data.AUTOTUNE
    )

def get_data():
        for element in raw_dataset.map(read_tfrecord):
            yield element

dataset = tf.data.Dataset.from_generator(get_data, output_signature=sign)

I would like to know if there’s a better way that leads to better performance.

Hi @Rick_Bruins ,

The way you are currently using to define the output signature for a TFRecordDataset through a generator is a valid and effective approach, especially when dealing with complex data processing or reading from TFRecord files. However, there’s an alternative methods.

You can refere the following tutorials links from TensorFlow documentation.

I hope it helps!

Thanks.