How to create embeddings for text data in tensorflow and how to pass it to the neural network model

Vishnu · February 21, 2022, 11:01am

I am working on deep learning based recommendation system movielens100K dataset in that i have features such as user id, movieid, ratings ,title, genre I would like to know how can convert the title of movie or genre of the movie in to embeddings and at input layer I would be concatenating all the embedding vectors

Bhack · February 21, 2022, 11:56am

Have you already checked these two tutorials?

rcauvin · February 22, 2022, 2:39am

The following worked for me

unique_title_ds = <dataset of unique titles>
max_tokens = 10_000
embedding_dimension = 32

self.title_vectorizer = tf.keras.layers.TextVectorization(max_tokens = max_tokens)
self.title_text_embedding = tf.keras.Sequential([
  self.title_vectorizer,
  tf.keras.layers.Embedding(max_tokens, embedding_dimension, mask_zero = True),
  tf.keras.layers.GlobalAveragePooling1D(),
])

self.title_vectorizer.adapt(unique_title_ds)

And invoking my item model concatenates the item embedding with the title embedding:

  def call(self, items):
    return tf.concat([
        self.item_embedding(items),
        self.title_text_embedding(items),
    ], axis = 1)

Vishnu · April 27, 2022, 11:06am

Hi if we consider only one genre for a movie can we use label encoding instead of one hot encoding