How to add multiple pre-processing steps and a post-processing step for text-classifiction model to serve via tensorflow-serving?

naveen_marthala · December 25, 2021, 5:44pm

What I currently have and trying to do:

When I receive a request from a client to the model in tensorflow-serving, I first need to process them using 13 regexes, then pass that text through tf.keras.preprocessing.text.Tokenizer to convert them to numbers(or token), and then pass them to tf.keras.preprocessing.sequence.pad_sequences to add 0s (for the sentences whose lenght doesn’t match the input that the model expects) at the end of each array(for a batch of inputs), then this(a single sentence or a batch of sentences as tokens) will be fed to a tf.keras model to get some probabilities as outputs. And I then need to map these probabilites(different thresholds for different units) to texts and return it to the client.

What problems am I currently facing trying to accomplish above:

While trying to put together all that to be able to serve the model using tensorflow-serving, I learned that some parts can be converted to tensorflow functions, but not all of it.

regexes: I still couldn’t figure out where and how to put my regexes to be able to manipulate the text.
tokenizer: I learned from some blogs and SO questions, that tf.lookup.StaticHashTable can be used for this purpose.
pad_sequences: no help with this too.
post-processing: I could find very little information to do this.

I read the beginner and advanced blogs on tensorflow-transform tutorials page, but either of them mentioned how to link those tft functions to the tf-keras model, while saving it. And I could also find some information about adding pre-processing for serving, but all of them involved tensorflow code and some workarounds, but they didn’t cover what I am trying to achieve even indirectly.

I can provide more information as required.

How do I add these steps to the graph, while saving the model?

jeongukjae · December 27, 2021, 3:02am

You can use this function (tf.strings.regex_replace) if you want to manipulate texts using regular expressions in the TF graph.
SentencePiece and WordPiece tokenizers are useful for me. I recommend these tokenizers to you.
text.pad_model_inputs is very useful to pad model inputs.
I don’t know what task you want to solve, so I’m just guessing the task. If you are solving a sequence tagging task, I think you can use tensorflow-text’s tokenizers and use tokenize_with_offsets to get offsets of each token. Then, you can use those offsets to map probabilities to texts. (For example, you can use tokenize_with_offsets in WordpieceTokeizer)

naveen_marthala · December 27, 2021, 5:25am

thanks for the answer and suggestions.

1 & 3. I will try to adopt tf.strings.regex_replace for regex operations on my text and text.pad_model_inputs. but, how do I put this inside the graph, while doing tf.keras.models.save_model() or convey tensorflow that i have some regexes in variables that have to be included in the graph?
4. Yes, I have been doing Sequence tagging, multi-label classification and mutli-class classification and this question is aimed at learning to serve those models with tf-serving. so, for example, with multi-label, I want to use the logits from tf.keras.model and if threshold is >0.5, i want to label the input text as belonging to a label(texts from a dictionary); and I also have different thresholds for different label. like previous comment, where and how do I include logic/code for this while saving the model?
2. I didn’t know about SentencePiece and WordPiece tokenizers. you meant to say that these packages/libraries have been useful for you? Sure, i will adapt them.

jeongukjae · December 27, 2021, 5:52am

1 & 3 & 4. After training the model, you can save graph with pre-processing and post-processing steps like below

...
...

# some training steps
model = ...
model.compile(...)
model.fit(...)

@tf.function
def inference_function(text):
  # do some preprocessing
  text = tf.strings.regex_replace(text, # ... some regex patterns...)
  token_ids, starts, ends = tokenizer.tokenize_with_offsets(text)

  model_inputs = # prepare model inputs using token_ids

  # inference model
  model_outputs = model(model_inputs)

  outputs = # do some post-processing with starts, ends, and model_outputs
  return outputs

# https://www.tensorflow.org/api_docs/python/tf/keras/Model#save
model.save(
  "some path to save the model",
  signatures={
    "inference_fn": inference_function.get_concrete_function(tf.TensorSpec([None], dtype=tf.string)),
  }
)

Yes! After training the sentencepiece model, you can load and use it with text.SentencepieceTokenizer in TF graph.

naveen_marthala · December 27, 2021, 7:10am

thanks for the code.

some small questions:

if i write more such complex functions docorating with @tf.function and as long as I stick to functions and classes tensorflow and it’s libraries(like tf, tf.keras, tf-addons, tf-text tf-transform etc,.), will the saved-model be loadable in other environments? if not, where can I find what part of tensorflow code can and can’t be used in these functions?
are you telling me that, if i had trained and used SentencePiece tokenisers, I can use them in pre-processing functions and in the tf-serving graph using text.SentencepieceTokenizer?

jeongukjae · December 28, 2021, 5:32am

Yes! But you have to register the required ops. If you used tf-text’s operations in the SavedModel, you have to register tf-text’s ops to load it(example).
Yes, exactly!