NLP TextVectorization tokenizer

In previous version of TF, we could use tokenizer = Tokenizer() and then call tokenizer.fit_on_texts(input) where input was a list of texts (in my case, a panda dataframe column containing a list of texts). Unfortunately this has been deprecated.
Is there a way to replicate this behaviour with TextVectorization?

Additionally how can we split a string by Upper case letters: for instance ‘ListOfHorrorMovies’ ?
I understand I need to use the standardize method of TextVecorization

Hi @Bondi_French, You can use tf.keras.layers.TextVectorization layer to replicate the same behavior . For more details please go through the code example below

import re
# Initialising list of string
sentences = [
# Splitting on UpperCase using re
res_list = []
for sentence in sentences:
  res_list.append(re.findall('[A-Z][^A-Z]*', sentence))
# Printing result
for i in res_list:
  processed_sentences.append((" ".join(i)))
['I Love My Dog', 'I Love My Cat', 'You Love My Dog!', 'Do You Think My Dog Is Amazing?']

import tensorflow as tf

text_dataset =
max_features = 5000  # Maximum vocab size.
max_len = 10
# Create the layer.
vectorize_layer = tf.keras.layers.TextVectorization(

# Now that the vocab layer has been created, call `adapt` on the
# text-only dataset to create the vocabulary.

# Create the model that uses the vectorize text layer
model = tf.keras.models.Sequential()

# Start by creating an explicit input layer. It needs to have a shape of
# (1,) (because we need to guarantee that there is exactly one string
# input per batch), and the dtype needs to be 'string'.
model.add(tf.keras.Input(shape=(1,), dtype=tf.string))

# The first layer in our model is the vectorization layer. After this
# layer, we have a tensor of shape (batch_size, max_len) containing
# vocab indices.

# Now, the model can map strings to integers, and you can add an
# embedding layer to map these integers to learned embeddings.
    'i really love my dog',
    'my dog loves my manatee'

1/1 [==============================] - 0s 297ms/step
array([[6, 1, 3, 2, 4, 0, 0, 0, 0, 0],
       [2, 4, 1, 2, 1, 0, 0, 0, 0, 0]])

Thank You.