Strange behavior of mixed precision both in metrics both in speed

I have an Image classifier based on ReseNet50. Until I have used transfer learning during the training, everything was ok. When I decided to train the ResNet from scratch I have a memory problems. It is ok, because there are too much parameters and so on. Therefore I decided to use mixed precision.

Here I report a toy version of my code that reproduces the issue:

import tensorflow as tf
from tensorflow.keras.datasets import mnist

def dummy_model(nClasses):
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(32, (3, 3), activation=‘relu’))
model.add(tf.keras.layers.MaxPooling2D((2, 2)))
model.add(tf.keras.layers.Conv2D(64, (3, 3), activation=‘relu’))

return model

the actual main

Settings for GPU

gpus = tf.config.list_physical_devices(‘GPU’)
if gpus:
# tf.config.set_logical_device_configuration(
# gpus[0],
# [tf.config.LogicalDeviceConfiguration(memory_limit=1024*3)])
tf.config.experimental.set_memory_growth(gpus[0], True)
logical_gpus = tf.config.list_logical_devices(‘GPU’)
print(len(gpus), “Physical GPUs,”, len(logical_gpus), “Logical GPUs”)
except RuntimeError as e:
# Virtual devices must be set before GPUs have been initialized

use of the mixed precision


Loading the data

(trainX, trainy), (testX, testy) = mnist.load_data()

Normalizing the data

trainX = trainX.astype(‘float32’) / 255
testX = testX.astype(‘float32’) / 255

base_model = dummy_model(10)

inputs = tf.keras.Input(shape=(28, 28, 1), name=‘digits’)
x = base_model(inputs)
x = tf.keras.layers.GlobalMaxPooling2D()(x)
outputs = tf.keras.layers.Activation(activation=“softmax”, dtype = ‘float32’)(x)
model = tf.keras.Model(inputs, outputs)

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate = 0.001), metrics=[“accuracy”], loss = “sparse_categorical_crossentropy”), trainy, epochs = 10)
model.evaluate(testX, testy)

If I comment the line with the mixed precision, this dummy model that classifies digits of the MNIST database works fine. But when I use the mixed precision, the speed is halved and I have a nearly zero accuracy.

I use Tensorflow 2.8 on native Windows with GPU acceleration.

The interesting thing is that if I force a convolution layer to be fp32, the values of both the loss and the accuracy rise.

Hi @Jonathan_Campeggio ,

Mixed precision training is most beneficial for very deep models with a large number of parameters. For relatively simple models like the one you provided for MNIST, it may not provide a significant speedup and could introduce numerical stability challenges. You can experiment with different settings and architectures to find the right balance between speed and accuracy for your specific use case.