Tensorflow Text Classification with BERT

Hi, I have a question about the documentation found on tensorflow.org involving BERT.

There is one tutorial called “Classify Text with BERT” Classify text with BERT  |  Text  |  TensorFlow

Here is another - “Fine-tuning a BERT model”

I was wondering what the difference between the two is when it comes to preprocessing data. Specifically, in the “Classify Text with BERT” tutorial, preprocessing the data is just using a preprocessing model provided by Tensorflow Hub. On the other hand, “Fine-tuning a BERT model” uses python code to tokenize and encode the data and stuff; it just seems a lot more complicated than using the preprocessing model provided.

So basically, I was wondering if there is an actual difference between these two preprocessing methods, and there is a reason for why one uses a model and the other actually implements python code?

Hi William,

The differences in summary what you notices, the use of preprocess models.
If you can avoid using all the Python code to do the BERT preprocessing you should! that will make your life much easier. Both when training the model and when serving where you’ll need to have the preprocessing code also available.

Why there’s a colab with the Python raw code then?
Because it will show you everything that is happening inside the preprocess model if you want to have a deeper understand. The other reason is, if you don’t have a preprocess model available, you will know what to do.

1 Like

Thank you very much for your response!

1 Like