I am currently working on a project which uses huggingface. I created the huggingface datasets and converted it to tensorflow. The method of conversion is not
from_tensor_slices(), the one shown in their documentation but using
from_generator(). I found this method a lot faster but at the time of training using TFTrainer(), I encounter an error:
ValueError: The training dataset must have an asserted cardinality
I checked and found the reason was
from_generator(). Inorder to verify this, I created a very basic dataset using
from_generator() method and checked its cardinality:
dumm_ds = tf.data.Dataset.from_generator(lambda: [tf.constant(1)]*1000, output_signature=tf.TensorSpec(shape=[None], dtype=tf.int64)) tf.data.experimental.cardinality(dumm_ds)
<tf.Tensor: shape=(), dtype=int64, numpy=-2>
where, ‘-2’ mean UNKNOWN_CARDINALITY.
I would like to know whether this is a bug or not? and If not then, how can I change the cardinality?