Text similarity search with universal-sentence-encoder-multilingual in Java -> issue

Hello everybody

I implemented a service in Java (Grails) that runs on a Docker machine. The service uses tensorflow 2.10.0 and tensorflow-text 2.10.0 installed on Docker machine’s Linux operating system.

Because I need Application.NLP.TEXT_EMBEDDING for text similarity search I need to load library _sentencepiece_tokenizer.so from tensorflow-text in my Java service.

But when I use the service it fails with error message:
undefined symbol: _ZNK10tensorflow8OpKernel11TraceStringB5cxx11ERKNS_15OpKernelContextEb

When I use older versions of tensorflow and tensorflow-text, fo example 2.2.0, I get the same error.

Can you tell where is the problem in my implementation?

Java Code:

-----Dependencies-----
compile 'org.tensorflow:tensorflow:1.15.0'
//compile 'org.tensorflow:libtensorflow_jni:1.15.0'
compile 'org.tensorflow:tensorflow-core-platform:0.4.2'
compile "ai.djl:api:0.19.0"
runtime "ai.djl.tensorflow:tensorflow-engine:0.19.0"
runtime "ai.djl.tensorflow:tensorflow-model-zoo:0.19.0"
//runtime "ai.djl.tensorflow:tensorflow-native-auto:2.4.1"
runtime "ai.djl.tensorflow:tensorflow-native-cpu:2.7.0"

-----Service methods-----
public static double[][] predict(String[] inputs) {
        // only EN: https://storage.googleapis.com/tfhub-modules/google/universal-sentence-encoder/4.tar.gz | file size ~ 1 GB
        // multilanguage: https://storage.googleapis.com/tfhub-modules/google/universal-sentence-encoder-multilingual/3.tar.gz
        String modelUrl = "https://storage.googleapis.com/tfhub-modules/google/universal-sentence-encoder-multilingual/3.tar.gz"

        Criteria<String[], double[][]> criteria =
            Criteria.builder()
                .optApplication(Application.NLP.TEXT_EMBEDDING)
                .setTypes(String[], double[][])
                .optModelUrls(modelUrl)
                .optTranslator(new MyTranslator())
                .optEngine("TensorFlow")
                .optProgress(new ProgressBar())
                .build()
        //library file needed for universal-sentence-encoder-multilingual because library not included in this model file; library file only available for linux systems
        TensorFlow.loadLibrary("/usr/local/lib/python3.7/dist-packages/tensorflow_text/python/ops/_sentencepiece_tokenizer.so")

        try {
            ZooModel<String[], double[][]> model = criteria.loadModel()
            Predictor<String[], double[][]> predictor = model.newPredictor()
            return predictor.predict(inputs);
        } catch (final Exception ex) {
            log.error(ex)
        }
    }

    private static final class MyTranslator implements NoBatchifyTranslator<String[], double[][]> {

        @Override
        NDList processInput(TranslatorContext ctx, String[] raw) {
            NDManager factory = ctx.NDManager
            NDList inputs = new NDList(raw.collect { factory.create(it) })
            new NDList(NDArrays.stack(inputs))
        }

        @Override
        double[][] processOutput(TranslatorContext ctx, NDList list) {
            long numOutputs = list.singletonOrThrow().shape.get(0)
            NDList result = []
            for (i in 0..<numOutputs) {
                result << list.singletonOrThrow().get(i)
            }
            result*.toFloatArray() as double[][]
        }
    }

Hi @Christian_Fuchs ,

You are mixing artifacts from 3 different frameworks, so it’s possible that they conflict each other. tensorflow:1.15.0 is from the deprecated version of TF Java and you shouldn’t use it. tensorflow-core-platform:0.4.2 is from the new TF Java and brings the 2.7.4 native binaries of TensorFlow, but you also import TF 2.7.0 from DJL’s tensorflow-native-cpu:2.7.0 at runtime.

So if you are planning to use DJL, like your code is suggesting, you should probably stick to the artifacts they distribute.

It looks like that extension requires C++11 ABI, which isn’t supported by CentOS 7, so TF for Java isn’t built against it by default:

$ c++filt _ZNK10tensorflow8OpKernel11TraceStringB5cxx11ERKNS_15OpKernelContextEb
tensorflow::OpKernel::TraceString[abi:cxx11](tensorflow::OpKernelContext const&, bool) const

You’ll be able to use that ABI if you build from source though.

The Java package’s version has to match the Python tensorflow package version.

Semantic matching works now in my service with tensorflow-core-platform:0.4.0 and Python package tensorflow 2.7.0