Dataset from pandas DataFrame that has list

Spencer_Uresk · August 15, 2022, 1:46pm

I am having a bit of a time wrapping my brain around the dataset functionality. I have a csv file that looks like this:

gender,products
M,[1,2]
M,[1,7,5,8]

I normally would create a dataset by doing something like:

tf.data.Dataset.from_tensor_slices(dict(df))

But TF complains about not being able to figure out the type spec from a list or numpy array. I tried to just leave it as a string and convert it in a map function with something like this:

@tf.function
def eval_products(x):
    y = {}
    y.update(x)
    if tf.strings.length(x['products']) > 2:
        arr = tf.strings.split(tf.strings.substr(x['products'], 1, tf.strings.length(x['products'])-2), ',')
        y['products'] = tf.strings.to_number(tf.strings.strip(arr))
    else:
        y['products'] = tf.ragged.constant([-1.])
    return y

ds = ds.map(eval_products)

This works, but feels really messy, and I can’t get it to make products a RaggedTensor instead of a plain Tensor. It seems like I’m missing something really basic, but I am not sure what it is and I can’t find any examples in the docs or anywhere else, so any ideas are appreciated.

Kiran_Sai_Ramineni · November 29, 2023, 12:48pm

Hi @Spencer_Uresk, I have tried to convert the products to ragged tensors with a similar type of code.

def convert_to_ragged(gender, products):
    products = tf.strings.regex_replace(products, "[\[\]']", "")
    split_products = tf.strings.split(products, ',')
    ragged_tensor = tf.RaggedTensor.from_row_lengths(split_products, row_lengths=tf.strings.length(split_products))
    return gender, ragged_tensor

please refer to this gist for complete code. And let us know if it helps. Thank You.