M1 benchmark times: Tensorflow+CPU << Torch+CPU ~ Torch+MPS ?!

I’m using the basic Text Classification example to experiment with various backends.
All package versions listed below and test code available here

  1. I’m using os.environ["KERAS_BACKEND"] = "torch" (or “tensorflow”) to compare these two keras backends, and then
  2. using with torch.device("cpu") (or “mps”) to see just what the MPS (Metal) backend can do.

There isn’t a TF build for MPS (there are wheels floating around, but…), so that gives me three experiments:

device torch cpu fixed: True

Epoch1: 213s 342ms/step - accuracy: 0.6257 - loss: 0.5940 - val_accuracy: 0.8622 - val_loss: 0.3184
Epoch2: 218s 349ms/step - accuracy: 0.8885 - loss: 0.2764 - val_accuracy: 0.8684 - val_loss: 0.3383
Test: 89s 113ms/step - accuracy: 0.8698 - loss: 0.3283

device torch mps fixed: True

  • Note: both torch.backends.mps.is_available()} and torch.backends.mps.is_built()} return True

Epoch1: 211s 338ms/step - accuracy: 0.5996 - loss: 0.6131 - val_accuracy: 0.8742 - val_loss: 0.3015
Epoch2: 229s 367ms/step - accuracy: 0.8909 - loss: 0.2728 - val_accuracy: 0.8608 - val_loss: 0.3584
Test: 90s 115ms/step - accuracy: 0.8652 - loss: 0.3569

TF/CPU

Epoch1: 16s 24ms/step - accuracy: 0.5729 - loss: 0.6330 - val_accuracy: 0.8652 - val_loss: 0.3144
Epoch2: 17s 27ms/step - accuracy: 0.8806 - loss: 0.2933 - val_accuracy: 0.8746 - val_loss: 0.3159
Test: 6s 8ms/step - accuracy: 0.8719 - loss: 0.3247

Question#1: Why is specifying DEVICE=MPS result in (slightly) LOWER training/testing times?

Question#2: What is it that tensorflow is able to do on CPU that is a ~10x improvement over torch?

torch=2.1.0.post100
torchtext=0.16.1
tensorflow=2.15.0
tensorflow_text=2.15.0
keras=3.0.5
keras_nlp=0.7.0