Accelerating TensorFlow using Apple M1 Max?

mehfuzh · June 15, 2022, 3:15am

Same problem here, I am using M1 Ultra, and although it is using the GPU the training is very slow compared to an RTX GPU.

2022-06-14 01:02:35.340769: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-06-14 01:02:37.458313: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-06-14 01:02:37.821098: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-06-14 01:02:37.832808: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-06-14 01:02:39.000561: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-06-14 01:02:39.011520: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-06-14 01:02:40.211017: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.

wildfluss · June 9, 2023, 10:34am

I just threw in my rough benchmark to get a better understanding of how good those M1s.

Training a small convnet for dogs vs. cats classification" from Ch 8 of Deep Learning with Python 2ed took:

on my late 2019 intel macbook pro with 16 G of RAM = 43m
on a colab GPU A100 High RAM for Pro subscription of ~ $ 10 monthly = 46 sec (!)
on a Mac Studio with M1 Max with 24-core GPU and systemMemory 64.00 GB it “executed in 1m 11.1s” but without ModelCheckpoint callback because that failed (it would not contribute meaningfully anyways)

I wonder how Air 15 with M2 with 10 GPUs would compare and if its roughly linear eg x2.5 longer for the same workload it would make a perfect laptop to develop and debug a model before sending it to GPU cloud like Lambda

If anyone has 5 min please run it on Air with M2 here Yakov Wildfluss / Benchmark · GitLab

wildfluss · June 10, 2023, 3:29pm

on Macbook Air M1 7 GPU with 16 G of RAM:

232.44 s = 3.8m

which is x10 faster latest macbook pro 16" intel (2019) and x3 slower Studio with x3 GPUs

which should make M2 with 10 GPUs ~ 30% faster which sounds just perfect

Mah_Neh · June 10, 2023, 4:40pm

I’m surprised how fast the Colab GPUs are

wildfluss · June 26, 2023, 10:51am

Alright, got maxed out M2 Air

145.52 s = 2.45 m which is 30-35% faster than Air M1

Ata_Tekeli · June 27, 2023, 11:54am

I’m using M1 GPU and having the same problem, it’ll be great as it’ll help me massively with production and don’t want to use Google Colab, but you can easily use PyTorch GPU while it’s difficult to find the same topic on TensorFlow docs

Ata_Tekeli · June 27, 2023, 11:55am

I looked at it but it still didn’t provide much solution, also, it may convolute stuff

Marshalleq_Q · July 18, 2023, 3:27am

So after reading this - I only see one comment about what I would call the elephant in the room, but perhaps I am misunderstanding something. Why is it that the neural engine is not used by Tensorflow? The way apple sells it, they literally say it’s faster than the GPU for machine learning applications. They go on to show a progression CPU → GPU → NPU. So am I missing something? I would have thought by now this would have been done, if it could / should be.

Thanks.

deg · August 9, 2023, 4:29am

But 235 seconds on a 2019 Intel Mac w/ 32GB, so something’s awry here.

deg · August 9, 2023, 4:29am

67.3 seconds on a MacBook Pro M2 Max w/ 64GB

deg · August 9, 2023, 8:30am

Update: I hadn’t realized that the 2019 MacBoks also had a GPU. So, there actually four numbers I can measure here: the two machines, each with either just tensorflow or with tensorflow and tensorflow-metal. I assume that the GPU is used iff tensorflow-metal is active.

I ran each test, except the slowest, a couple of times and saw minor variances in the speeds.

2019 Intel MacBook Pro 32Gb, w/o GPU: 1187s (~20 minutes)
2019 Intel MacBook Pro 32G w/ GPU: 235-244s (~4 minutes)
2023 MacBook Pro M2 Max 64GB w/o GPU: 477-488s (~6 minutes)
2023 MacBook Pro M2 Max 64GB w GPU: 64.5-67.3s (~1 minute)

DebuggingModels · August 16, 2023, 3:59am

I’ve switched to using tensorflow on my M1Max and M2Max macbook pro. It’s much faster than using my RTX 2060 6G. I have some benchmarks in my github repo here. I compared a variety setups with different batch size to qualify the performance difference.

mac tensorflow benchmarks

For me, the great thing about M2Max with 96G of memory, is the ability to run large language models. I’ve verified you really can use 90G of memory training vision models on M2Max MBP