Multilingual sentence encoder

I was looking for a model to process French sentences, but I can’t find any for TF.js. So, using tensorflowjs_converter, I tried to convert the universal-sentence-encoder-multilingual model (TensorFlow Hub), but it’s not working.

I get an error “Op type not registered ‘SentencepieceOp’ in binary running”

Is there an existing multilingual model available for TF.js or a way to make it work?

Thanks!

4 Likes

Have you checked TF2.0 hub Universal Sentence Encoder Multilingual Sentenepieceop not registered problem · Issue #463 · tensorflow/hub · GitHub?

1 Like

as mentioned by Bhack, adding import tensorflow_text might fix.

2 Likes

I’m trying convert that model to TF.js with the tensorflowjs_converter tool like this:

tensorflowjs_converter \     
    --input_format=tf_hub \
    'https://tfhub.dev/google/universal-sentence-encoder-multilingual/3' \
    web_model

I did make it work with Pyhton (with import tensorflow_text), but I’m looking for a way to make it works with JavaScript.

4 Likes

Great question and use case @XeL. Looping in @Jason :slightly_smiling_face:

2 Likes

It was similar to Converting to tensorflow.js · Issue #668 · google-research/text-to-text-transfer-transformer · GitHub

2 Likes

Thanks! Let me loop in the TFJS team for this one. Someone should reply shortly.

2 Likes

This is also an interesting topic for tfhub about how to handle ecosystem dependencies in tfhub models like in this case when we need to use the model with the converter.

2 Likes

I haven’t found doc on how to use tfjs.converters directly, but I was able to go beyond the tensorflow_text with the following code (based on the logic of the CLI converter):

import tensorflow as tf
import tensorflowjs as tfjs
import tensorflow_hub as hub
import tensorflow_text

from tensorflowjs.converters import tf_saved_model_conversion_v2

tf_saved_model_conversion_v2.convert_tf_hub_module(
    "https://tfhub.dev/google/universal-sentence-encoder-multilingual/3",
    "web_model",
    signature="serving_default"
)

However, I now get the following error:

ValueError: Unsupported Ops in the model before optimization
SentencepieceOp, SegmentSum, RaggedTensorToSparse, ParallelDynamicStitch, SentencepieceTokenizeOp, DynamicPartition

It seems that this multilingual model uses different operators than the universal sentence encoder provided of TF.js model’s page.

1 Like

Xel, Another issue you might find later, given you manage to convert, is that this model is a little bit for a webpage (> 200MB).
You might have to take that into account too.

1 Like

From TFHub side, we try to address the ecosystem dependencies by adding this to the documentation.

in this case specifically, the code snippet uses tf_text (tfhub)

do you think adding some specific section to the documentation would help?

1 Like

I don’t know if we could add somewhere machine readable metadata related to dependencies as this will be better for the other ecosystem tools or any automation.

Also this could be consumed for creating a specific dependencies section in the TFHUB model webpage.

1 Like

Yes as we don’t have a lite version for the multilingual model as for universal-sentence-encoder-lite

1 Like

XeL, TensorFlow.js unfortunately doesn’t have support for those ops yet, but you might be able to convert the model to a TFLite saved model and then use tfjs-tflite, which runs TFLite models in the browser using webassembly.

1 Like

The TF lite Text ops list is available at Supported Select TensorFlow operators  |  TensorFlow Lite

1 Like

This is a good point.

@kempy , what do you think?

1 Like

Good news! I was able to compile it using TensorFlow Lite. I’ll test it out, but as @Igusm point it out, it weight 278 MB, so I guess I’ll have trouble using it on the web. :rofl:

It’s really hard to find pre-trained models for languages other than English.

Thanks for your help!

3 Likes

Off the top of my head - have you tried any of the quantization techniques for model size reduction mentioned in in the TensorFlow Lite docs? I hope some of the following stuff helps:

Also, in case you haven’t checked this out - there are TensorFlow Lite Model Maker guides and tutorials specifically for NLP (QA and classification) (cc @billy):

And, if you are into ML research:

3 Likes

Very good tip @8bitmp3 , these techniques might be able to help with the mode’s size!
You might lose a little bit of accuracy but it’s well worth to try!

2 Likes

Hi. I’m trying to use in NodeJS the model “universal-sentence-encoder-multilingual” (located at “TensorFlow Hub”), which has support for Spanish language, but when I try to load it with “tfjs-node v.4.2.0” library (on both Windows and Linux operating systems, using TF2.0 Saved Model v3 format) I get the following error related to “SentencepieceOp”:

“E tensorflow/core/grappler/optimizers/meta_optimizer.cc:903] tfg_optimizer{} failed: NOT_FOUND: Op type not registered ‘SentencepieceOp’ in binary running on LAPTOP. Make sure the Op and Kernel are registered in the binary running in this process.Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler should be done before importing the graph , as contrib ops are lazily registered when the module is first accessed.
While importing FunctionDef: __inference_pruned_162942
when importing GraphDef to MLIR module in GrapplerHook”.

This is the source code used to load the model:


const tf = require(‘@tensorflow/tfjs-node’);
const modelPath = ‘models/muse/saved_model’;
const model = await tf.node.loadSavedModel(modelPath);

Do you know if it is possible to use this multilingual model in NodeJS? By the way, I was able to execute the model in English (“universal-sentence-encoder” prepared for TFJS), located at “https://cdn.jsdelivr.net/npm/@tensorflow-models/universal-sentence-encoder”.

Thank you!