How well is MKL supported by TensorFlow? (now disabled in Java)

Hi everyone,

At SIG JVM, we have just decided to stop supporting and building native TensorFlow MKL-enabled artifacts for the following reasons:

  • At pretty much each new release of TensorFlow, the MKL build is broken on various platforms and it requires some gymnastics on our side to get it work again (if we are even able to).

  • We did not investigated much on the reasons why but performances with MKL were many times worst than without it.

So that being said, if anyone here has some insights to share about the actual status of MKL in TensorFlow and/or any ideas on how we can continue to support it without trouble, that would be greatly appreciated.

Thanks!
Karl

3 Likes

Hi @karllessard

Back in TensorFlow 1.12.x I did a benchmark via custom build of TensorFlow with mkl and opt flags. To build the Java part I just followed these instructions: tensorflow/README.md at master · tensorflow/tensorflow · GitHub

In my benchmark (training NER) the Intel Cascade Lake w/MKL was close and sometimes better than GPU (using the system’s memory it could have a larger batch size).

That’s being said, I’ve never tried testing the inference. But the training was much faster than a native CPU build on newer CPU architectures.

1 Like

Thanks @MaziyarPanahi , can you tell me on which platform (OS) you observed such performances at that time?

1 Like

Absolutely! These are the platforms I observed improvements:

  • Dell PowerEdge ​ C4130 - ​Ubuntu 16.04 LTS - Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz ​
  • AWS​ p3.8xlarge​ - Ubuntu 18.04 LTS - Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz​
  • AWS​ c5.12xlarge - Ubuntu 18.04 LTS - 2nd generation Intel Xeon Scalable Processors (Cascade Lake)​
1 Like

It looks like the default build of TF Core does have something of “MKL” in it, just not enabled by default:

Setting TF_ENABLE_ONEDNN_OPTS=1 with default builds of TF Java might just do the same as Python.

Really, interesting… I’ll give it a try, if anyone does before me please share your benchmarks!