Converting Words into ids using tf.keras.layers.StringLookup

How can I generate ids from words using tf.keras.layers.StringLookup?

What is your input? Individual words or text that contains multiple words?

For example, you could have a dataset in which each row represents a product review. The text of the review could be one of the columns and would contain multiple words. In that case, you might use tf.keras.layers.TextVectorization. More info here.

Your dataset might also contain a column for the category of the product , which could be a single word or short phrase that you wish to treat as a single value. In that case, you would likely use tf.keras.layers.StringLookup. Some examples are here.

Hi, @rcauvin thankyou for your time. My input is texts that contain multiple words. The main problem I’m facing is with the vocabulary parameter even after the being tensor type it is giving the error.

for ex my input is -
s=[‘My name is Noah’, ‘I am a ML enthusiast’]
how to set vocabulary parameter with the words given in the list?

You can do something like:

max_tokens = 20
embedding_dimension = 32
    
unique_descriptions = ["My name is Noah", "I am a ML enthusiast"]
    
description_vectorizer = tf.keras.layers.TextVectorization(max_tokens = max_tokens)
description_embeddings = tf.keras.Sequential([
  description_vectorizer,
  tf.keras.layers.Embedding(max_tokens, embedding_dimension, mask_zero = True),
  tf.keras.layers.GlobalAveragePooling1D()
], "description")
description_vectorizer.adapt(unique_descriptions)

You can confirm it created the vocabulary of unique words:

description_vectorizer.get_vocabulary()

[’’, ‘[UNK]’, ‘noah’, ‘name’, ‘my’, ‘ml’, ‘is’, ‘i’, ‘enthusiast’, ‘am’, ‘a’]

And you can use the description_embeddings when training or generating predictions using the model with input data that includes descriptions.

2 Likes

Thank you @rcauvin it worked :+1: