[Info Need] 8 bit Optimizer!

8-bit Optimizers - one of the tricks widely used to trian large model to save memory space in gpus. It does the job with good, and make it possible to train larger model in consumer level gpus (16gb, 24gb). It’s available in pytorch. Would you please inform if there’s an equivalent in tensorflow?

Similar - grad-checkpointing.

Good reading. A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using Hugging Face Transformers, Accelerate and bitsandbytes

Hi @innat ,

Yes, It is called the mixed precision training API. This API allows you to train your models using lower precision numbers, such as 8-bit integers, while still maintaining good accuracy. This can save a significant amount of memory, which can be important for training large models on consumer-level GPUs.you can then enable mixed precision training by setting the
tf.config.optimizer.set_experimental_options use mixed_precision flag to True . Once you have enabled mixed precision training, you can then choose which variables to train in lower precision. You can do this by setting the tf.config.optimizer.experimental_mixed_precision_loss_scale flag to a value between 0 and 1. A value of 0 will train all variables in full precision, while a value of 1 will train all variables in lower precision.

Please see the following articles for more information:

Quantization-Aware Training:

Mixed Precision Training:

I hope this helps you.


Mixed precision in tf is about float16/bfloat16. If I’m not mistaken, this is not alternative for 8-bit optimizer.

( Also, where did you get this tf.config.optimizer.experimental_use_mixed_precision, it doesn’t exist in the API. Sounds like a chat-gpt answer :sweat_smile:)

Hi @innat ,

Yes, you are correct, its not an alternative ,using this API can improve performance of modern GPU’s and TPU’s.

You can check this Quantization aware training mighte usefull where models with Non-quantized Top-1 Accuracy and 8-bit Quantized Accuracy as shown in the table.

It’s my bad , by mistake underscore(_) continued in the quote for next word as well. I have corrected it now.


Thanks for the update. I see now.

About tf.config.optimizer.set_experimental_options , instead we use Mixed precision policy API. Again definitely a good option.

About QAT, I did interact with the APIs before, but didn’t work well. But I will revisit it. Thanks for referencing.

1 Like

To enable mixed precision training in TensorFlow, you can use the “tf.keras.mixed_precision” module. You can set the policy to “mixed_float16” to enable mixed precision training, which will automatically cast your weights and activations to float16 when appropriate.

@AugustineZeke You can also use convnet inside the transformer block and make a new model using mixed precision by default.