Word Embeddings Not Accurate

I am trying to build my own word2vec model using the code provided here
Link: - Word2Vec  |  TensorFlow Core

So i have even tried to increase the data as well for training the word embedding and i am able to achieve a good model accuracy but when i plot the word vectors on the Embedding Projector the distance between words or the word similarity is really bad, if i even use the cosine distance formula between very similar words the result is bad.

Whereas if the same data is used to train own embeddings using the Gensim library ( not pre-trained) the results of distance and similarity are way better, even on the Embedding Projector as well.

Please can someone help me regarding this, i want to use the Word2Vec code only which is provided by TensorFlow but i am not able to get good results for word distance and word similarity.

Could there be a problem in how you are serializing the embedding vectors and the associated words?

Also, can you confirm there is no difference in the hyperparameters that you are using in TensorFlow and Genism?

1 Like

I am sure regarding the serializing of the vectors and associated word. But regarding the hyper-parameters, i have tried my best to use the same but gensim model is like a blackbox so with just one sentence of code i get the entire word vector array, surely there might be some changes in the code or the processing part but the Tensorflow model is giving no result at all.

Also there has been an issue raised before on the Github Repository of Tensorflow but it doesn’t seem to have been solved.
Issue Raised: - The word vector obtained by the word2vec tutorial is very bad · Issue #50645 · tensorflow/tensorflow · GitHub

Hi @aiman_shivani . Sorry, that was posted by mistake. It was not related to the tutorial code.

/Cc: @markdaoust may be Mark can shed some additional light.