TensorFlow Linux wheels are being upgraded to manylinux2014

Hello everyone,

The TensorFlow OSS DevInfra Team is planning to upgrade the Linux wheels of TensorFlow to be manylinux2014 compatible. This has been in the works for a while and we are happy to announce that from Monday, March 21, we are going to start publishing manylinux2014 compatible TensorFlow nightly packages ahead of the next TensorFlow release, TensorFlow 2.9, which has its branch cut scheduled on March 29th, 2022 (tentative). As part of this upgrade, we are also shifting to using the new libstdcxx ABI which was discussed to be implemented as part of the manylinux upgrade in the March 2022 edition of TF Sig Build meeting.

Also on the 21st, the TF SIG Build Dockerfiles will change to use the new toolchain by default. As long as everything goes smoothly, the first manylinux2014 tf-nightly packages should arrive on Tuesday, March 22. If you’d like to help us test manylinux2014 packages before then, please use the links below.

If you would like to test the manylinux2014 build environment (pending PR), please see the instructions here.

If you would like to start testing the manylinux2014 packages for advanced comparison, here is a set of manylinux2010 and manylinux2014 packages we built at the same commit (9d98dc772).

FAQs:

Q1. I’m a downstream developer. How should I change my build process to be compatible with new TF wheels?

The ABI change is not compatible with the old (manylinux2010) wheels. To be compatible with the new TF wheels, please follow the instructions below:

  1. If you use TensorFlow’s toolchain/crosstool, upgrade to the new manylinux2014 crosstool. See the .bazelrc here.
  2. If your build contains the flag --cxxopt=-D_GLIBCXX_USE_CXX11_ABI=0, change the 0 to 1. The new toolchain effectively sets this to 1 by default if it is not explicitly set.

Aside from the ABI flag change, the toolchain upgrade by itself (from devtoolset-7 to devtoolset-9) is not likely to cause breakages if your package does not already build with TensorFlow’s toolchains. If you do use TensorFlow’s toolchains, you should upgrade to the new manylinux2014 crosstool.

Q2. What kinds of breakages during the build process are most likely related to these changes?

  1. Linker errors / Undefined reference errors usually involving __cxx11 symbols
  2. RuntimeError: random_device could not be read

Thank you!

7 Likes

FYI @bhack @seanpmorgan

Thanks very much @angerson and team!

1 Like

The wheels for tf-nightly and tf-nightly-gpu as of version dev20220322 (today, March 22) are now manylinux2014-compliant (a.k.a. manylinux_2_17) and have been built with --cxxopt=-D_GLIBCXX_USE_CXX11_ABI=1. We also started building libtensorflow nightlies with the new toolchains, switched TensorFlow’s bazelrc to use the new toolchains for our release build configurations, and updated the SIG Build Dockerfiles.

Please try out the new wheels, the new build toolchains, and the upgraded Dockerfiles to build TensorFlow, and let us know in this thread if you run into any new kinds of problems.

4 Likes

Hi
Please note that setting D_GLIBCXX_USE_CXX11_ABI=1 is ignored by the toolchain offered by the official manylinux2014 docker. This, apparently, has to do with a bug in Centos7. Nevertheless, as stated by PEP599 images produced with manylinux2014 docker get manylinux2014 tag. So switching to c++11 ABI actually seems to make tf non-compliant with manylinux2014.

This is a problem for downstream projects that build using official pypi docker images.

Please comment

@perfinion Any thoughts on this?

If I understand the situation correctly, this only affects packages that talk to TF at the C++ level. We haven’t heard from any such groups yet, and I don’t know how many are likely to be affected. How is this affecting you? The easy solution is to use the SIG Build images linked above, which many downstream teams already use.

We are exactly in this situation: linking with TF at C++ level and using pypi dockers for building. Currently investigating two solutions: migrating to SIG build images or migrating to manylinux2_24. ATM both seem to work.
Fortunately neither do we link against any other pypi packages nor do other pypi packages link against us. I.e. we are not affected by incompatibilities with the broader pypi ecosystem so we’ll just follow TF ABI switch.

@Pawel_Piskorski What downstream project in this case? Sorry if I you mentioned and I missed it. Do have some links to your release workflow/scripts?

I’ve heard of that centos bug but there isnt really any good solution in general to the whole CXX11_ABI=1 thing. All linux distros did full rebuilds many years ago but python is in a weird position right now. None of the manylinux specs mention CXX11_ABI at all :frowning: . CXX11_ABI=0/1 is orthogonal to manylinux for the most part and we definitely need to move to =1 eventually so doing the CXX11_ABI=1 switch together all at once seemed the least-painful option :confused: .

As for moving to new images, both our SIG-Build or the manylinux2_24 containers are reasonable, the biggest difference most likely would come down to the SIG-Build ones have GPU stuff ready to go but not sure if that matters to your project.

Again, sorry for the trouble caused, but hopefully things will be better once we get this transition over with :smiley:

2 Likes

@perfinion thanks, but I can’t disclose the project and the release scripts are not public anyway.
Agree that taking that leap is the best way forward. c++11 is over a decade old so, yeah, high time :slight_smile:

1 Like

I tried building Tensorflow C++ on Linux Ubuntu 20.04 LTS with TF2.9.0 and encountered the error :
undefined symbol: _ZNK6google8protobuf7Message11GetTypeNameB5cxx11Ev

right at the end of the buid:
bazel build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=1" -c opt --verbose_failures //tensorflow:libtensorflow.so //tensorflow:libtensorflow_cc.so //tensorflow:libtensorflow_framework.so

I dont have either protobuf-compiler or libprotobuf-dev installed on my system and Python Wheel tensorflow-gpu==2.9.0 is working perfectly on my virtual env which I am also using with ./configure

Additionally, just a month ago, I successfully built the full TF C++ 2.7.0 with GPU (RTX-3080) -CUDA-11.2 config
bazel build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" -c opt --verbose_failures //tensorflow:libtensorflow.so //tensorflow:libtensorflow_cc.so //tensorflow:libtensorflow_framework.so
on the same machine

Is this the right forum to post this error or should I go to TF github?

Internally, we have seen few other cases with undefined symbols but those were usually caused because they were building with the old ABI or had an incompatible build environment. Could you try building in any one of the SIG Build Docker images?

Many thanks Nitin!, I will try that. But as a general note, I am seeing this error while trying to build the Python wheel on my Windows 11 machine too!. I have seen this error once before with 2.4 which was reported by someone else too on tf-github but then had no problems with 2.5 to 2.7 thereafter on either Windows or Linux.

Test also the build on 2.9.1 tag

I tried three different builds today with 2.9.1:

build1
Same error! Iḿ afraid for the build:
bazel build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=1" -c opt --copt=-march=native --verbose_failures //tensorflow:libtensorflow.so //tensorflow:libtensorflow_cc.so //tensorflow:libtensorflow_framework.so

With CUDA (2.7.0 is the last version built successfully on my machine with CUDA). I got the following error with 2.9.1:
ERROR: /home/abhimehrish/tensorflow/tensorflow/core/kernels/mlir_generated/BUILD:1196:23: Generating kernel ‘//tensorflow/core/kernels/mlir_generated:logical_and_gpu_i1_i1_kernel_generator’ failed: (Exit 127): tf_to_kernel failed: error executing command …

Execution platform: @local_execution_config_platform//:platform

bazel-out/k8-opt-exec-50AE0418/bin/tensorflow/compiler/mlir/tools/kernel_gen/tf_to_kernel: symbol lookup error: bazel-out/k8-opt-exec-50AE0418/bin/tensorflow/compiler/mlir/tools/kernel_gen/tf_to_kernel: undefined symbol: _ZNK6google8protobuf7Message11GetTypeNameB5cxx11Ev

build2 (without CUDA):
bazel build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=1" -c opt --copt=-march=native --verbose_failures //tensorflow:libtensorflow.so //tensorflow:libtensorflow_cc.so //tensorflow:libtensorflow_framework.so

and build3(without CUDA):
bazel build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=1" -c opt --copt=-mavx –copt=-mavx2 --verbose_failures //tensorflow:libtensorflow.so //tensorflow:libtensorflow_cc.so //tensorflow:libtensorflow_framework.so

Builds 2 and 3 returned similar errors due to another missing symbol. I might try the sig build sometime this week but I have already tried this enough.

@Bhack, I have seen your reply to TensorRT version question here:TF-TRT: No Support for TensorRT v8?

Not sure if this was rejected or merged: TF-TRT: No Support for TensorRT v8? - #3 by Bhack

If support for 8.2 is not there, will 7.0 work on Ubuntu 20.04? It is recommended for 18.04. Thanks

I think it is better to isolate the build target issues from your environment so If you could test the same failing build in the SIG Build Docker image will be great.