How well is MKL supported by TensorFlow? (now disabled in Java)

karllessard · June 10, 2021, 1:19pm

Hi everyone,

At SIG JVM, we have just decided to stop supporting and building native TensorFlow MKL-enabled artifacts for the following reasons:

At pretty much each new release of TensorFlow, the MKL build is broken on various platforms and it requires some gymnastics on our side to get it work again (if we are even able to).
We did not investigated much on the reasons why but performances with MKL were many times worst than without it.

So that being said, if anyone here has some insights to share about the actual status of MKL in TensorFlow and/or any ideas on how we can continue to support it without trouble, that would be greatly appreciated.

Thanks!
Karl

MaziyarPanahi · June 12, 2021, 5:03pm

Hi @karllessard

Back in TensorFlow 1.12.x I did a benchmark via custom build of TensorFlow with mkl and opt flags. To build the Java part I just followed these instructions: tensorflow/README.md at master · tensorflow/tensorflow · GitHub

In my benchmark (training NER) the Intel Cascade Lake w/MKL was close and sometimes better than GPU (using the system’s memory it could have a larger batch size).

That’s being said, I’ve never tried testing the inference. But the training was much faster than a native CPU build on newer CPU architectures.

karllessard · June 16, 2021, 2:13am

Thanks @MaziyarPanahi , can you tell me on which platform (OS) you observed such performances at that time?

MaziyarPanahi · June 16, 2021, 6:24am

Absolutely! These are the platforms I observed improvements:

Dell PowerEdge C4130 - Ubuntu 16.04 LTS - Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
AWS p3.8xlarge - Ubuntu 18.04 LTS - Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
AWS c5.12xlarge - Ubuntu 18.04 LTS - 2nd generation Intel Xeon Scalable Processors (Cascade Lake)

saudet · June 23, 2021, 12:17am

It looks like the default build of TF Core does have something of “MKL” in it, just not enabled by default:

Setting TF_ENABLE_ONEDNN_OPTS=1 with default builds of TF Java might just do the same as Python.

karllessard · June 24, 2021, 4:48pm

Really, interesting… I’ll give it a try, if anyone does before me please share your benchmarks!