How to build Universal Sentence Embedding for custom corpus?

Wish to use Universal Sentence Embeddings from Tf hub, retrain with own corpus for a classification task.

So layers are:

  • UnivSentEmb pretrained from tf hub with trainable ON
  • Dense 300, relu
  • Dense num_classes, softmax

Once the model is trained on own corpus, during the prediction time it behaves as a classifier.

predicted_class = model.predict(test_sentence)

Here, wish to query middle layer values, which is a 300 long embedding vector, for the test_sentence.

Possible? How? Is this approach correct? Can we say that this 300 long vector of that test_sentence is its USE embedding?

1 Like

Hi @Yogesh_Kulkarni

one way of doing what you want is to have your custom model have 2 outputs: the dense layer with num_classes and the dense layer with the 300 embeddings

for that you might need to use the Functional API: The Functional API  |  TensorFlow Core