Unable to read TFRecord using tf.data.TFRecordDataset

I am trying to read a TFRecord file like this:

dataset = tf.data.TFRecordDataset("./tfrecords/train.record").map(_extract_fn).batch(3)

However, when I run

features, labels = iter(dataset).next()

I get this error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [4], [batch]: [2] [Op:IteratorGetNext]

This is the function that parses the TFRecord file:

    features = {
        'image/height': tf.io.FixedLenFeature([],tf.int64),
        'image/width': tf.io.FixedLenFeature([], tf.int64),
        'image/filename': tf.io.VarLenFeature(tf.string),
        'image/id': tf.io.FixedLenFeature([], tf.string),
        'image/encoded': tf.io.FixedLenFeature([],tf.string),
        'image/format': tf.io.FixedLenFeature([], tf.string),
        'image/object/class/text': tf.io.VarLenFeature(tf.string),
        'image/object/class/label': tf.io.VarLenFeature(tf.int64),
        'image/object/bbox/xmin': tf.io.VarLenFeature(tf.float32),
        'image/object/bbox/xmax': tf.io.VarLenFeature(tf.float32),
        'image/object/bbox/ymin': tf.io.VarLenFeature(tf.float32),
        'image/object/bbox/ymax': tf.io.VarLenFeature(tf.float32),
        }
sample = tf.io.parse_single_example(tfrecord, features)
data = {}


data["image/encoded"] = tf.image.decode_jpeg(sample["image/encoded"], channels=3)
label = sample['image/object/class/label'].values

return data,label

If I write return data instead and only set features = iter(dataset).next() it works fine.
What is the issue here?

Thanks for any help!

The label has a variable length:

So when you try to .batch the dataset it can’t pack the different sized tensors together.

You either want to use .padded_patch or .apply(tf.data.experimental.dense_to_ragged_batch(...))

2 Likes

Thank you! Using .padded_patch fixed the issue!