Strange behavior of mixed precision both in metrics both in speed

I have an Image classifier based on ReseNet50. Until I have used transfer learning during the training, everything was ok. When I decided to train the ResNet from scratch I have a memory problems. It is ok, because there are too much parameters and so on. Therefore I decided to use mixed precision.

Here I report a toy version of my code that reproduces the issue:

import tensorflow as tf
from tensorflow.keras.datasets import mnist

def dummy_model(nClasses):
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(32, (3, 3), activation=‘relu’))
model.add(tf.keras.layers.MaxPooling2D((2, 2)))
model.add(tf.keras.layers.Conv2D(64, (3, 3), activation=‘relu’))

return model

the actual main

Settings for GPU

gpus = tf.config.list_physical_devices(‘GPU’)
if gpus:
try:
# tf.config.set_logical_device_configuration(
# gpus[0],
# [tf.config.LogicalDeviceConfiguration(memory_limit=1024*3)])
tf.config.experimental.set_memory_growth(gpus[0], True)
logical_gpus = tf.config.list_logical_devices(‘GPU’)
print(len(gpus), “Physical GPUs,”, len(logical_gpus), “Logical GPUs”)
print(“\n\n”)
except RuntimeError as e:
# Virtual devices must be set before GPUs have been initialized
print(e)

use of the mixed precision

tf.keras.mixed_precision.set_global_policy(“mixed_float16”)

Loading the data

(trainX, trainy), (testX, testy) = mnist.load_data()

Normalizing the data

trainX = trainX.astype(‘float32’) / 255
testX = testX.astype(‘float32’) / 255

base_model = dummy_model(10)

inputs = tf.keras.Input(shape=(28, 28, 1), name=‘digits’)
x = base_model(inputs)
x = tf.keras.layers.GlobalMaxPooling2D()(x)
tf.keras.layers.Dense(10)
outputs = tf.keras.layers.Activation(activation=“softmax”, dtype = ‘float32’)(x)
model = tf.keras.Model(inputs, outputs)

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate = 0.001), metrics=[“accuracy”], loss = “sparse_categorical_crossentropy”)
model.fit(trainX, trainy, epochs = 10)
model.evaluate(testX, testy)

If I comment the line with the mixed precision, this dummy model that classifies digits of the MNIST database works fine. But when I use the mixed precision, the speed is halved and I have a nearly zero accuracy.

I use Tensorflow 2.8 on native Windows with GPU acceleration.

The interesting thing is that if I force a convolution layer to be fp32, the values of both the loss and the accuracy rise.

Hi @Jonathan_Campeggio ,

Mixed precision training is most beneficial for very deep models with a large number of parameters. For relatively simple models like the one you provided for MNIST, it may not provide a significant speedup and could introduce numerical stability challenges. You can experiment with different settings and architectures to find the right balance between speed and accuracy for your specific use case.

Thanks.