How to use tf.dataset to train a Google universal sentence encoder?

The problem is the following: the Universal Sentence Encoder takes a list of strings as input and tf.Data doesn’t work with the list.

Therefore, how to make the pipeline output a list to feed the Universal Sentence Encoder layer?

Here is a sample of my x variable from my dataset <tf.Tensor: shape=(), dtype=string, numpy=b'Computer Supported Social Networking For Augmenting Cooperation'>

If a feed it directly to the model, it gives the following error:

InvalidArgumentError: input must be a vector, got shape: [] [[{{node text_preprocessor/tokenize/StringSplit/StringSplit}}]] [Op:__inference_train_function_13665]

I have already tried to use .map() for outputting a list and adding a lambda layer for the same purpose. Both strategies have failed!

Thanks

Please share standalone code to replicate the above issue?

Here comes a reproducible example:

import tensorflow as tf

X= ['Calculation of radiation force and torque exerted on a uniaxial anisotropic sphere by an incident Gaussian beam with arbitrary propagation and polarization directions',
 'Optical fiber nano-tip and 3D bottle beam as non-plasmonic optical tweezers',
 'Simultaneous passive coherent beam combining and mode locking of fiber laser arrays',
 'Thermal and laser characteristics of Nd doped La011Y089VO4 crystal',
 'Computer Supported Social Networking For Augmenting Cooperation',
 'Distortion-free freehand-scanning OCT implemented with real-time scanning speed variance correction',
 'Effective permittivity for resonant plasmonic nanoparticle systems via dressed polarizability',
 'Stability of high bit rate quantum key distribution on installed fiber',
 'Single-mode and wavelength tunable lasers based on deep-submicron slots fabricated by standard UV-lithography',
 'Stress compensation in hafnia/silica optical coatings by inclusion of alumina layers']
y=[array([0]),
 array([0]),
 array([0]),
 array([0]),
 array([0]),
 array([0]),
 array([0]),
 array([0]),
 array([0]),
 array([0])]
df = tf.data.Dataset.from_tensor_slices((X, y))`

module_url = "https://tfhub.dev/google/universal-sentence-encoder/3" #@param ["https://tfhub.dev/google/universal-sentence-encoder/4", "https://tfhub.dev/google/universal-sentence-encoder-large/5"]

def model_1():
    q1 = layers.Input(shape=(), dtype=tf.string, name='input_1')
    keraslayer = hub.KerasLayer(module_url, input_shape=[], 
                               dtype=tf.string, trainable=True)(q1)
    x = layers.Dense(50, activation="relu")(keraslayer['outputs'])
    x = layers.Dropout(0.1)(x)
    outputs = layers.Dense(1, activation="softmax")(x)
    model = Model(inputs=q1, outputs=outputs)
    return model
model = model_1()

checkpoint = tf.keras.callbacks.ModelCheckpoint('adgrad_200_0.3_BERT_weights.h5', monitor='val_sparse_categorical_accuracy', save_best_only=True, verbose=1)
model.compile(optimizer="Adagrad", loss=tf.keras.losses.sparse_categorical_crossentropy, metrics=['sparse_categorical_accuracy'])

history = model.fit(
    df, batch_size=32, epochs=1000, 
    initial_epoch=0,
                use_multiprocessing=True,
                max_queue_size=10,
                workers=0, callbacks=[checkpoint]
)
ValueError: in user code:

    File "/home/marlon/]/envs/sensorweb/lib/python3.9/site-packages/keras/engine/training.py", line 878, in train_function  *
        return step_function(self, iterator)
    File "/home/marlon/]/envs/sensorweb/lib/python3.9/site-packages/keras/engine/training.py", line 867, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/marlon/]/envs/sensorweb/lib/python3.9/site-packages/keras/engine/training.py", line 860, in run_step  **
        outputs = model.train_step(data)
    File "/home/marlon/]/envs/sensorweb/lib/python3.9/site-packages/keras/engine/training.py", line 808, in train_step
        y_pred = self(x, training=True)
    File "/home/marlon/]/envs/sensorweb/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
        raise e.with_traceback(filtered_tb) from None

    ValueError: Exception encountered when calling layer "keras_layer" (type KerasLayer).
    
    in user code:
    
        File "/home/marlon/]/envs/sensorweb/lib/python3.9/site-packages/tensorflow_hub/keras_layer.py", line 229, in call  *
            result = f()
    
        ValueError: Shape must be rank 1 but is rank 0 for '{{node text_preprocessor/tokenize/StringSplit/StringSplit}} = StringSplit[skip_empty=true](text_preprocessor/StaticRegexReplace_1, text_preprocessor/tokenize/StringSplit/Const)' with input shapes: [], [].
    
    
    Call arguments received:
      • inputs=tf.Tensor(shape=(), dtype=string)
      • training=True