Dataset from generator shuffling

Hi,

I have made a dataset from generator like:

ds_series = tf.data.Dataset.from_generator(
trim_size, args=[data_input_tot_EqLen, trimmed_lbl, seq_len, max_len_per],
output_types=(tf.float32, tf.int32),
output_shapes=((5511, 101, 3), (1)))

then I shuffle the dataset and split it to training and testing:

ds_series= ds_series.shuffle(buffer_size=16)
ds_train=ds_series.take(train_smpls)
ds_valid=ds_series.skip(train_smpls)

I’d like to count the number of samples in each class, therefore, I’d like to see what labels would be assigned to the training and testing dataset.

I run the following command:

_, lbl_train = ds_train

this take a lot of time (I understand this because trim_size I defined above in pretty heavy) but my question is related to the messages that it shows:

I tensorflow/core/kernels/data/shuffle_dataset_op.cc:175] Filling up shuffle buffer (this may take a while): 1 of 16

so it counts filling up the buffer from 1 to 16. however, this does not fit with what has mention about shuffle buffer size in the documentation:

https://www.tensorflow.org/api_docs/python/tf/data/Dataset#shuffle

it is supposed to take random samples from a 16 sample-buffer which means that the randomization process is not limited to 16.

Am I wrong here?