Mac OS Sonoma not compatible with latest tensorflow-metal libraries

I keep seeing output like this – its harmless but massively annoying!

I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.

loc(“mps_select”(“(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm”:294:0)): error: ‘anec.gain_offset_control’ op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got ‘memref<1x42x1x1xi1>’

2 Likes

thanks for sharing this here. much appreciated. Will save me from a nightmare

1 Like

Hi @Erez_Katz

Could you please share the minimal reproducible code to replicate the error and to understand the issue? Thank you.

Thank you Renu – I was able to overcome all the issues with the following (albeit undesirable) solution.
I simply disabled the GPU use Seems like the tensorflow macos and tensorflow-metal were incompatible with my other libraries also the GPU code was very slow. I ended up using CPU only and the speed came back and all the errors are gone now too. Apprently GPU is designed for heavy deep models but with simple models, its actually slower. Per macos developer.
Here is what I ended up putting in: I set g_use_gpu to false naturally to solve the issue
if Config.g_use_gpu:
set_GPUs()
else:
# disabling GPU as it is too slow
tf.config.set_visible_devices([], ‘GPU’)

This issue also happen when I use tensorflow 2.9 and metal 0.5 on MacOS Sonoma 14.0 to train a yolov3. I can train it on Ventura

@Erez_Katz ,

Okay, Could you please try again by installing the latest TensorFlow version 2.14 and the compatible Tensorflow-metal 1.1.0 as mentioned in this PyPI repository to detect the GPU?

Let us know if the issue still persists. Thank you.

Would you mind sharing the specific command to install both TensorFlow version 2.14 and Tensorflow-metal 1.10 I am asking because my pip install tensorflow==2.14 doesn’t seem to work – it will only recognize 2.13

Also - do I still need to use Adam Legacy for my CNN?

Hello! I did as you told (I updated the libraries using poetry) and even though I had no errors in my log during the training of a shallow autoencoder with gpu (m2 pro), the speed of training is ridiculously slow!

I was getting a 9ms per epoch in Ventura, and now 100ms in Sonoma.

I also noticed that the gpu usage does not even get to 15% whereas in Ventura I remember the usage of gpu got up to 70% for the same network.

I’m not quite sure if those versions are to be fixed soon or if it is worth to downgrade to Ventura.

@Erez_Katz , You can use pip install tensorflow-metal==1.1.0 to install the tensorflow-metal plug-in. Please refer to this doc for the same. Thank you.

(ekbase) bridgelineIT@MacBook-Pro-3 ~ % pip install tensorflow-metal==1.1.0

ERROR: Could not find a version that satisfies the requirement tensorflow-metal==1.1.0 (from versions: 0.1.0, 0.1.1, 0.1.2, 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.5.1, 0.6.0, 0.7.0, 0.7.1, 0.8.0, 1.0.0, 1.0.1)

ERROR: No matching distribution found for tensorflow-metal==1.1.0

@Erez_Katz I guess you use python below 3.9. You may need to upgrade python3.9 which could find tensorflow-macos2.14 and 1.1.0. I upgrade to this version and add Adam Legacy for my network. It will not popup this error - ‘anec.gain_offset_control’ op.

When I train a yolov3, I suffer an error called as below with M2
/AppleInternal/Library/BuildRoots/90c9c1ae-37b6-11ee-a991-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Utility/MPSLibrary.mm:550: failed assertion `MPSKernel MTLComputePipelineStateCache unable to load function ndArrayConvolution2DGradientWithWeightsA14.
Compute function exceeds available temporary registers: (null)

I try to set environment veriables - MTL_SHADER_VALIDATION=1. It can train, but I found the speed of training is 5-7 times slower than upgrade Sonoma. I also monitor that the GPU is using. I do not know why the speed is so slower. It seem to related the - MTL_SHADER_VALIDATION=1. But if i am not set this, it can not train failed with GPU

I am using python 3.9 and tf-macos2.14 and try to run a deep learning keras based model I used to be able to run for tf2.11-macos. it can run but I got so much annoying warning like the following:

loc(“mps_select”(“(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm”:294:0)): error: ‘anec.gain_offset_control’ op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got ‘memref<1x2063x1x20xi1>’

I use macOS Sonoma version 14.1 and Apple M2 Pro. Regarding the combo python 3.9, tensorflow 2.14.0 (or 2.14.1) and tensorflow-metal 1.1.0, the training proceeds as follows:

Epoch 1/50

431/431 [==============================] - 166s 222ms/step - loss: 0.2384 - sparse_categorical_accuracy: 0.9313 - val_loss: 0.2923 - val_sparse_categorical_accuracy: 0.9290

Epoch 2/50

431/431 [==============================] - 136s 315ms/step - loss: 0.1056 - sparse_categorical_accuracy: 0.9671 - val_loss: 0.1287 - val_sparse_categorical_accuracy: 0.9568

Epoch 3/50

431/431 [==============================] - 1062s 2s/step - loss: 0.0835 - sparse_categorical_accuracy: 0.9719 - val_loss: 0.1326 - val_sparse_categorical_accuracy: 0.9596

Epoch 4/50

431/431 [==============================] - 419s 972ms/step - loss: 0.0863 - sparse_categorical_accuracy: 0.9721 - val_loss: 0.1155 - val_sparse_categorical_accuracy: 0.9629

Epoch 5/50

431/431 [==============================] - 166s 383ms/step - loss: 0.1069 - sparse_categorical_accuracy: 0.9691 - val_loss: 0.1385 - val_sparse_categorical_accuracy: 0.9555

In a nutshell, the training is ridiculously slow and the loss does not really decrease, but either increases of stagnates to a really bad level.

If I use python 3.9, tensorflow 2.13.0 and tensorflow-metal 0.5.1 instead, the training is fast, and proceeds as expected:

Epoch 1/50
431/431 [==============================] - 16s 34ms/step - loss: 0.1884 - sparse_categorical_accuracy: 0.9306 - val_loss: 0.0764 - val_sparse_categorical_accuracy: 0.9668
Epoch 2/50
431/431 [==============================] - 16s 34ms/step - loss: 0.0665 - sparse_categorical_accuracy: 0.9746 - val_loss: 0.0529 - val_sparse_categorical_accuracy: 0.9790
Epoch 3/50
431/431 [==============================] - 16s 35ms/step - loss: 0.0517 - sparse_categorical_accuracy: 0.9798 - val_loss: 0.0516 - val_sparse_categorical_accuracy: 0.9795
Epoch 4/50
431/431 [==============================] - 16s 35ms/step - loss: 0.0453 - sparse_categorical_accuracy: 0.9825 - val_loss: 0.0397 - val_sparse_categorical_accuracy: 0.9842
Epoch 5/50
431/431 [==============================] - 16s 35ms/step - loss: 0.0418 - sparse_categorical_accuracy: 0.9839 - val_loss: 0.0372 - val_sparse_categorical_accuracy: 0.9858

but the terminal gets filled by the following error messages:

loc(“mps_select”(“(mpsFileLoc): /AppleInternal/Library/BuildRoots/495c257e-668e-11ee-93ce-926038f30c31/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm”:294:0)): error: ‘anec.gain_offset_control’ op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got ‘memref<1x1x1x1xi1>’