# Dataset map function returns wrong tensor shape

I have defined a map function that unpacks 6 32-bit integers into 192 (1/0) integers:

def unpack(x, y):
unpacked_data = tf.TensorArray(tf.uint32, size=0, dynamic_size=True)
for b in x:
for _ in range(32):
unpacked_data = unpacked_data.write(unpacked_data.size(), b & 1)
b = tf.bitwise.right_shift(b, 1)

``````return x, unpacked_data.stack(), y
``````

#for training I would return unpacked_data.stack(), y

The map function works:

a=np.array([0, 0, 0, 16, 45, 57],dtype=np.uint32)
b=unpack(a, a)
print(b)

returns:

(array([ 0, 0, 0, 16, 45, 57], dtype=uint32), <tf.Tensor: shape=(192,), dtype=uint32, numpy=
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=uint32)>, array([ 0, 0, 0, 16, 45, 57], dtype=uint32))

Now I want to apply this map function to the features of a batched dataset:

with tf.device(“CPU”):
train = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(BATCH_SIZE)
#for training I would use 4 * BATCH_SIZE
validate = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(BATCH_SIZE)

x_train and y_train are NumPy arrays. But when I apply the map function to the ‘train’ dataset:

train = train.map(unpack)
list(train.as_numpy_iterator())

The shape of the bit-array is wrong: it is transposed, so not 192x1, but 32x6:

[(array([[ 0, 0, 0, 16, 45, 57]], dtype=uint32),
array([[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 0, 1],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]], dtype=uint32),
array([1.])),

Why?

Regards,
GW

Below is my understanding:

The shape of the bit-array is wrong because the `TensorArray` that you are using to store the bit-array is dynamic-sized. This means that the size of the `TensorArray` is not known in advance, and it will grow as needed to accommodate the data. When you apply the `map` function to the `train` dataset, the `TensorArray` will be created for each element in the dataset. However, the size of the `TensorArray` will be different for each element, depending on the value of the input data. This is why the shape of the bit-array is transposed when you iterate over the dataset.

To fix this, you can use a fixed-size `TensorArray`. This will ensure that the size of the `TensorArray` is the same for each element in the dataset, and the shape of the bit-array will be correct.

I hope this helps!

Thanks.

Hi,

def unpack(x, y):
#unpacked_data = []
unpacked_data = tf.TensorArray(tf.uint32, size=192)
i = 0;
for b in x:
for _ in range(32):
unpacked_data = unpacked_data.write(i, b & 1)
b = tf.bitwise.right_shift(b, 1)
i += 1

``````return x, unpacked_data.stack(), y
``````

but this still returns the wrong shape. Perhaps I have to specify the element_shape as well as the size, but if so, how? element_shapes like ([192]), ([1]) etc. raise the error:

Incompatible shape for value (()), expected …

Thanks,
GW

You can see what happens by applying the original map function to a 2d array:

a=np.array([[0, 0, 0, 16, 45, 57],[0, 0, 0, 16, 45, 57]],dtype=np.uint32)
b=unpack(a,a)
print(b)

The output is:

(array([[ 0, 0, 0, 16, 45, 57],
[ 0, 0, 0, 16, 45, 57]], dtype=uint32), <tf.Tensor: shape=(64, 6), dtype=uint32, numpy=
array([[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 0, 1],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 0, 1],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]], dtype=uint32)>, array([[ 0, 0, 0, 16, 45, 57],
[ 0, 0, 0, 16, 45, 57]], dtype=uint32))

So the shift gets applied to all elements of the 6 element array, and that repeated 32 (64) times which (although unintended) gives the result transposed.

So how can I change the map function so that each row of 6 integers gets unpacked into 192 integers?

Regards,
GW