Converting Words into ids using tf.keras.layers.StringLookup

Deependra_Parichha · May 17, 2022, 12:12pm

How can I generate ids from words using tf.keras.layers.StringLookup?

rcauvin · May 17, 2022, 1:14pm

What is your input? Individual words or text that contains multiple words?

For example, you could have a dataset in which each row represents a product review. The text of the review could be one of the columns and would contain multiple words. In that case, you might use tf.keras.layers.TextVectorization. More info here.

Your dataset might also contain a column for the category of the product , which could be a single word or short phrase that you wish to treat as a single value. In that case, you would likely use tf.keras.layers.StringLookup. Some examples are here.

Deependra_Parichha · May 18, 2022, 5:00am

Hi, @rcauvin thankyou for your time. My input is texts that contain multiple words. The main problem I’m facing is with the vocabulary parameter even after the being tensor type it is giving the error.

Deependra_Parichha · May 18, 2022, 5:43am

for ex my input is -
s=[‘My name is Noah’, ‘I am a ML enthusiast’]
how to set vocabulary parameter with the words given in the list?

rcauvin · May 18, 2022, 2:06pm

You can do something like:

max_tokens = 20
embedding_dimension = 32
    
unique_descriptions = ["My name is Noah", "I am a ML enthusiast"]
    
description_vectorizer = tf.keras.layers.TextVectorization(max_tokens = max_tokens)
description_embeddings = tf.keras.Sequential([
  description_vectorizer,
  tf.keras.layers.Embedding(max_tokens, embedding_dimension, mask_zero = True),
  tf.keras.layers.GlobalAveragePooling1D()
], "description")
description_vectorizer.adapt(unique_descriptions)

You can confirm it created the vocabulary of unique words:

description_vectorizer.get_vocabulary()

[’’, ‘[UNK]’, ‘noah’, ‘name’, ‘my’, ‘ml’, ‘is’, ‘i’, ‘enthusiast’, ‘am’, ‘a’]

And you can use the description_embeddings when training or generating predictions using the model with input data that includes descriptions.

Deependra_Parichha · May 18, 2022, 2:45pm

Thank you @rcauvin it worked