How to add multiple pre-processing steps and a post-processing step for text-classifiction model to serve via tensorflow-serving?

What I currently have and trying to do:

When I receive a request from a client to the model in tensorflow-serving, I first need to process them using 13 regexes, then pass that text through tf.keras.preprocessing.text.Tokenizer to convert them to numbers(or token), and then pass them to tf.keras.preprocessing.sequence.pad_sequences to add 0s (for the sentences whose lenght doesn’t match the input that the model expects) at the end of each array(for a batch of inputs), then this(a single sentence or a batch of sentences as tokens) will be fed to a tf.keras model to get some probabilities as outputs. And I then need to map these probabilites(different thresholds for different units) to texts and return it to the client.

What problems am I currently facing trying to accomplish above:

While trying to put together all that to be able to serve the model using tensorflow-serving, I learned that some parts can be converted to tensorflow functions, but not all of it.

  1. regexes: I still couldn’t figure out where and how to put my regexes to be able to manipulate the text.
  2. tokenizer: I learned from some blogs and SO questions, that tf.lookup.StaticHashTable can be used for this purpose.
  3. pad_sequences: no help with this too.
  4. post-processing: I could find very little information to do this.

I read the beginner and advanced blogs on tensorflow-transform tutorials page, but either of them mentioned how to link those tft functions to the tf-keras model, while saving it. And I could also find some information about adding pre-processing for serving, but all of them involved tensorflow code and some workarounds, but they didn’t cover what I am trying to achieve even indirectly.

I can provide more information as required.

How do I add these steps to the graph, while saving the model?

  1. You can use this function (tf.strings.regex_replace) if you want to manipulate texts using regular expressions in the TF graph.
  2. SentencePiece and WordPiece tokenizers are useful for me. I recommend these tokenizers to you.
  3. text.pad_model_inputs is very useful to pad model inputs.
  4. I don’t know what task you want to solve, so I’m just guessing the task. If you are solving a sequence tagging task, I think you can use tensorflow-text’s tokenizers and use tokenize_with_offsets to get offsets of each token. Then, you can use those offsets to map probabilities to texts. (For example, you can use tokenize_with_offsets in WordpieceTokeizer)

thanks for the answer and suggestions.

1 & 3. I will try to adopt tf.strings.regex_replace for regex operations on my text and text.pad_model_inputs. but, how do I put this inside the graph, while doing tf.keras.models.save_model() or convey tensorflow that i have some regexes in variables that have to be included in the graph?
4. Yes, I have been doing Sequence tagging, multi-label classification and mutli-class classification and this question is aimed at learning to serve those models with tf-serving. so, for example, with multi-label, I want to use the logits from tf.keras.model and if threshold is >0.5, i want to label the input text as belonging to a label(texts from a dictionary); and I also have different thresholds for different label. like previous comment, where and how do I include logic/code for this while saving the model?
2. I didn’t know about SentencePiece and WordPiece tokenizers. you meant to say that these packages/libraries have been useful for you? Sure, i will adapt them.

1 & 3 & 4. After training the model, you can save graph with pre-processing and post-processing steps like below


# some training steps
model = ...

def inference_function(text):
  # do some preprocessing
  text = tf.strings.regex_replace(text, # ... some regex patterns...)
  token_ids, starts, ends = tokenizer.tokenize_with_offsets(text)

  model_inputs = # prepare model inputs using token_ids

  # inference model
  model_outputs = model(model_inputs)

  outputs = # do some post-processing with starts, ends, and model_outputs
  return outputs

  "some path to save the model",
    "inference_fn": inference_function.get_concrete_function(tf.TensorSpec([None], dtype=tf.string)),
  1. Yes! After training the sentencepiece model, you can load and use it with text.SentencepieceTokenizer in TF graph.

thanks for the code.

some small questions:

  1. if i write more such complex functions docorating with @tf.function and as long as I stick to functions and classes tensorflow and it’s libraries(like tf, tf.keras, tf-addons, tf-text tf-transform etc,.), will the saved-model be loadable in other environments? if not, where can I find what part of tensorflow code can and can’t be used in these functions?
  2. are you telling me that, if i had trained and used SentencePiece tokenisers, I can use them in pre-processing functions and in the tf-serving graph using text.SentencepieceTokenizer?
  1. Yes! But you have to register the required ops. If you used tf-text’s operations in the SavedModel, you have to register tf-text’s ops to load it(example).
  2. Yes, exactly!
1 Like