Multilingual sentence encoder

XeL · May 18, 2021, 11:09pm

I was looking for a model to process French sentences, but I can’t find any for TF.js. So, using tensorflowjs_converter, I tried to convert the universal-sentence-encoder-multilingual model (TensorFlow Hub), but it’s not working.

I get an error “Op type not registered ‘SentencepieceOp’ in binary running”

Is there an existing multilingual model available for TF.js or a way to make it work?

Thanks!

Bhack · May 19, 2021, 9:54am

Have you checked TF2.0 hub Universal Sentence Encoder Multilingual Sentenepieceop not registered problem · Issue #463 · tensorflow/hub · GitHub?

lgusm · May 19, 2021, 10:33am

as mentioned by Bhack, adding import tensorflow_text might fix.

XeL · May 19, 2021, 1:02pm

I’m trying convert that model to TF.js with the tensorflowjs_converter tool like this:

tensorflowjs_converter \     
    --input_format=tf_hub \
    'https://tfhub.dev/google/universal-sentence-encoder-multilingual/3' \
    web_model

I did make it work with Pyhton (with import tensorflow_text), but I’m looking for a way to make it works with JavaScript.

8bitmp3 · May 19, 2021, 4:52pm

Great question and use case @XeL. Looping in @Jason

Bhack · May 19, 2021, 4:58pm

It was similar to Converting to tensorflow.js · Issue #668 · google-research/text-to-text-transfer-transformer · GitHub

Jason · May 19, 2021, 5:07pm

Thanks! Let me loop in the TFJS team for this one. Someone should reply shortly.

Bhack · May 19, 2021, 5:17pm

This is also an interesting topic for tfhub about how to handle ecosystem dependencies in tfhub models like in this case when we need to use the model with the converter.

XeL · May 19, 2021, 5:34pm

I haven’t found doc on how to use tfjs.converters directly, but I was able to go beyond the tensorflow_text with the following code (based on the logic of the CLI converter):

import tensorflow as tf
import tensorflowjs as tfjs
import tensorflow_hub as hub
import tensorflow_text

from tensorflowjs.converters import tf_saved_model_conversion_v2

tf_saved_model_conversion_v2.convert_tf_hub_module(
    "https://tfhub.dev/google/universal-sentence-encoder-multilingual/3",
    "web_model",
    signature="serving_default"
)

However, I now get the following error:

ValueError: Unsupported Ops in the model before optimization
SentencepieceOp, SegmentSum, RaggedTensorToSparse, ParallelDynamicStitch, SentencepieceTokenizeOp, DynamicPartition

It seems that this multilingual model uses different operators than the universal sentence encoder provided of TF.js model’s page.

lgusm · May 19, 2021, 5:36pm

Xel, Another issue you might find later, given you manage to convert, is that this model is a little bit for a webpage (> 200MB).
You might have to take that into account too.

lgusm · May 19, 2021, 5:41pm

From TFHub side, we try to address the ecosystem dependencies by adding this to the documentation.

in this case specifically, the code snippet uses tf_text (tfhub)

do you think adding some specific section to the documentation would help?

Bhack · May 19, 2021, 6:02pm

I don’t know if we could add somewhere machine readable metadata related to dependencies as this will be better for the other ecosystem tools or any automation.

Also this could be consumed for creating a specific dependencies section in the TFHUB model webpage.

Bhack · May 19, 2021, 6:03pm

Yes as we don’t have a lite version for the multilingual model as for universal-sentence-encoder-lite

Matthew_Soulanille · May 19, 2021, 6:05pm

XeL:

However, I now get the following error:

ValueError: Unsupported Ops in the model before optimization
SentencepieceOp, SegmentSum, RaggedTensorToSparse, ParallelDynamicStitch, SentencepieceTokenizeOp, DynamicPartition

XeL, TensorFlow.js unfortunately doesn’t have support for those ops yet, but you might be able to convert the model to a TFLite saved model and then use tfjs-tflite, which runs TFLite models in the browser using webassembly.

Bhack · May 19, 2021, 6:16pm

The TF lite Text ops list is available at Supported Select TensorFlow operators | TensorFlow Lite

lgusm · May 19, 2021, 8:57pm

This is a good point.

@kempy , what do you think?

XeL · May 20, 2021, 1:04am

Good news! I was able to compile it using TensorFlow Lite. I’ll test it out, but as @Igusm point it out, it weight 278 MB, so I guess I’ll have trouble using it on the web.

It’s really hard to find pre-trained models for languages other than English.

Thanks for your help!

8bitmp3 · May 20, 2021, 3:03pm

Off the top of my head - have you tried any of the quantization techniques for model size reduction mentioned in in the TensorFlow Lite docs? I hope some of the following stuff helps:

Model optimization | TensorFlow Lite

Also, in case you haven’t checked this out - there are TensorFlow Lite Model Maker guides and tutorials specifically for NLP (QA and classification) (cc @billy):

TensorFlow Lite Model Maker
- BERT Question Answer with TensorFlow Lite Model Maker
- Text classification with TensorFlow Lite Model Maker

And, if you are into ML research:

Google AI Blog: Advancing NLP with Efficient Projection-Based Model Architectures

pQRNN is quantized, further reducing the model size by a factor of 4x.

Link to arXiv:
- https://arxiv.org/abs/1712.05877 (Jacob et al., 2017 - Google)
Quantization Aware Training with TensorFlow Model Optimization Toolkit - Performance with Accuracy — The TensorFlow Blog (2020, TensorFlow blog)

lgusm · May 24, 2021, 10:46am

Very good tip @8bitmp3 , these techniques might be able to help with the mode’s size!
You might lose a little bit of accuracy but it’s well worth to try!

mmadaria · March 15, 2023, 2:32am

Hi. I’m trying to use in NodeJS the model “universal-sentence-encoder-multilingual” (located at “TensorFlow Hub”), which has support for Spanish language, but when I try to load it with “tfjs-node v.4.2.0” library (on both Windows and Linux operating systems, using TF2.0 Saved Model v3 format) I get the following error related to “SentencepieceOp”:

“E tensorflow/core/grappler/optimizers/meta_optimizer.cc:903] tfg_optimizer{} failed: NOT_FOUND: Op type not registered ‘SentencepieceOp’ in binary running on LAPTOP. Make sure the Op and Kernel are registered in the binary running in this process.Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler should be done before importing the graph , as contrib ops are lazily registered when the module is first accessed.
While importing FunctionDef: __inference_pruned_162942
when importing GraphDef to MLIR module in GrapplerHook”.

This is the source code used to load the model:

const tf = require(‘@tensorflow/tfjs-node’);
const modelPath = ‘models/muse/saved_model’;
const model = await tf.node.loadSavedModel(modelPath);

Do you know if it is possible to use this multilingual model in NodeJS? By the way, I was able to execute the model in English (“universal-sentence-encoder” prepared for TFJS), located at “https://cdn.jsdelivr.net/npm/@tensorflow-models/universal-sentence-encoder”.

Thank you!

Multilingual sentence encoder

const tf = require(‘@tensorflow/tfjs-node’); const modelPath = ‘models/muse/saved_model’; const model = await tf.node.loadSavedModel(modelPath);

const tf = require(‘@tensorflow/tfjs-node’);
const modelPath = ‘models/muse/saved_model’;
const model = await tf.node.loadSavedModel(modelPath);