Accelerating TensorFlow using Apple M1 Max?

Same problem here, I am using M1 Ultra, and although it is using the GPU the training is very slow compared to an RTX GPU.

2022-06-14 01:02:35.340769: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-06-14 01:02:37.458313: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-06-14 01:02:37.821098: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-06-14 01:02:37.832808: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-06-14 01:02:39.000561: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-06-14 01:02:39.011520: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-06-14 01:02:40.211017: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.

I just threw in my rough benchmark to get a better understanding of how good those M1s.

Training a small convnet for dogs vs. cats classification" from Ch 8 of Deep Learning with Python 2ed took:

  • on my late 2019 intel macbook pro with 16 G of RAM = 43m

  • on a colab GPU A100 High RAM for Pro subscription of ~ $ 10 monthly = 46 sec (!)

  • on a Mac Studio with M1 Max with 24-core GPU and systemMemory 64.00 GB it “executed in 1m 11.1s” but without ModelCheckpoint callback because that failed (it would not contribute meaningfully anyways)

I wonder how Air 15 with M2 with 10 GPUs would compare and if its roughly linear eg x2.5 longer for the same workload it would make a perfect laptop to develop and debug a model before sending it to GPU cloud like Lambda :slight_smile:

If anyone has 5 min please run it on Air with M2 here Yakov Wildfluss / Benchmark · GitLab

on Macbook Air M1 7 GPU with 16 G of RAM:

  • 232.44 s = 3.8m

which is x10 faster latest macbook pro 16" intel (2019) and x3 slower Studio with x3 GPUs

which should make M2 with 10 GPUs ~ 30% faster which sounds just perfect :slight_smile:

I’m surprised how fast the Colab GPUs are

Alright, got maxed out M2 Air

  • 145.52 s = 2.45 m which is 30-35% faster than Air M1

I’m using M1 GPU and having the same problem, it’ll be great as it’ll help me massively with production and don’t want to use Google Colab, but you can easily use PyTorch GPU while it’s difficult to find the same topic on TensorFlow docs

I looked at it but it still didn’t provide much solution, also, it may convolute stuff

So after reading this - I only see one comment about what I would call the elephant in the room, but perhaps I am misunderstanding something. Why is it that the neural engine is not used by Tensorflow? The way apple sells it, they literally say it’s faster than the GPU for machine learning applications. They go on to show a progression CPU → GPU → NPU. So am I missing something? I would have thought by now this would have been done, if it could / should be.

Thanks.

But 235 seconds on a 2019 Intel Mac w/ 32GB, so something’s awry here.

67.3 seconds on a MacBook Pro M2 Max w/ 64GB

Update: I hadn’t realized that the 2019 MacBoks also had a GPU. So, there actually four numbers I can measure here: the two machines, each with either just tensorflow or with tensorflow and tensorflow-metal. I assume that the GPU is used iff tensorflow-metal is active.

I ran each test, except the slowest, a couple of times and saw minor variances in the speeds.

2019 Intel MacBook Pro 32Gb, w/o GPU: 1187s (~20 minutes)
2019 Intel MacBook Pro 32G w/ GPU: 235-244s (~4 minutes)
2023 MacBook Pro M2 Max 64GB w/o GPU: 477-488s (~6 minutes)
2023 MacBook Pro M2 Max 64GB w GPU: 64.5-67.3s (~1 minute)

I’ve switched to using tensorflow on my M1Max and M2Max macbook pro. It’s much faster than using my RTX 2060 6G. I have some benchmarks in my github repo here. I compared a variety setups with different batch size to qualify the performance difference.

mac tensorflow benchmarks

For me, the great thing about M2Max with 96G of memory, is the ability to run large language models. I’ve verified you really can use 90G of memory training vision models on M2Max MBP

1 Like