How to Build Tensorflow Debug Version

dl_xiaocaiji · December 6, 2021, 5:33am

I would like to fully understand Tensorflow mechanisms. To complete the task, I would like to build a debug library of Tensorflow 2.4, write a program using Tensorflow’s C++ API, link my object files to the Tensorflow debug library and debug the program using gdb/cuda-gdb. Can I get specific steps of building the debug library that can be used to fulfill my purpose?

Bhack · December 6, 2021, 10:48am

https://tensorflow-prod.ospodiscourse.com/t/what-is-the-best-approach-to-debug-in-tensorflow-when-working-with-the-c-code-base/468

dl_xiaocaiji · December 6, 2021, 1:10pm

I just viewed the discussion, and navigated to the following document: tensorflow/CONTRIBUTING.md at master · tensorflow/tensorflow · GitHub. In the document, it is pointed out that " the --config=dbg option is not officially supported".

I used the following command to build the source code of v2.4: “bazel build --config=cuda --config=dbg //tensorflow/tools/pip_package:build_pip_package”, and I got an error message:

ERROR: /tensorflow/tensorflow/BUILD:724:1: Linking of rule ‘//tensorflow:libtensorflow_framework.so.2.4.4’ failed (Exit 1)
/usr/bin/ld: bazel-out/k8-dbg/bin/tensorflow/stream_executor/cuda/libcublas_plugin.lo(cuda_blas.pic.o): relocation R_X86_64_PC32 against undefined symbol `_ZN15stream_executor3gpu12_GLOBAL__N_120CUDABlasLtMatmulPlan14kMaxBatchCountE’ can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status
Target //tensorflow/tools/pip_package:build_pip_package failed to build.

Bhack · December 6, 2021, 1:16pm

If you see the commit date:

github.com/tensorflow/tensorflow

cannot build TensorFLow with --config=dbg

opened 07:49PM - 05 May 21 UTC

closed 11:27PM - 22 Jun 21 UTC

bas-aarts

stat:awaiting tensorflower type:build/install subtype: ubuntu/linux subtype:bazel

when building opensource TensorFlow with bazel build --config=dbg --config=…cuda --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" //tensorflow/tools/pip_package:build_pip_package (for SM 7.0 only) The build dies at link time with: `ERROR: /home/baarts/tensorflow-GH/tensorflow/python/BUILD:3373:24: Linking of rule '//tensorflow/python:_pywrap_tensorflow_internal.so' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-dbg/bin/tensorflow/python/_pywrap_tensorflow_internal.so-2.params bazel-out/k8-dbg/bin/external/llvm-project/llvm/libScalar.a(AnnotationRemarks.pic.o):(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32 against `.debug_info' bazel-out/k8-dbg/bin/external/llvm-project/llvm/libScalar.a(BDCE.pic.o):(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32 against `.debug_info' bazel-out/k8-dbg/bin/external/llvm-project/llvm/libScalar.a(CallSiteSplitting.pic.o):(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32 against `.debug_info' bazel-out/k8-dbg/bin/external/llvm-project/llvm/libScalar.a(ConstantHoisting.pic.o):(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32 against `.debug_info' bazel-out/k8-dbg/bin/external/llvm-project/llvm/libScalar.a(ConstraintElimination.pic.o):(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32 against `.debug_info' bazel-out/k8-dbg/bin/external/llvm-project/llvm/libScalar.a(CorrelatedValuePropagation.pic.o):(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32 against `.debug_info' bazel-out/k8-dbg/bin/external/llvm-project/llvm/libScalar.a(DCE.pic.o):(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32 against `.debug_info' bazel-out/k8-dbg/bin/external/llvm-project/llvm/libScalar.a(DeadStoreElimination.pic.o):(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32 against `.debug_info' bazel-out/k8-dbg/bin/external/llvm-project/llvm/libScalar.a(DivRemPairs.pic.o):(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32 against `.debug_info' bazel-out/k8-dbg/bin/external/llvm-project/llvm/libScalar.a(EarlyCSE.pic.o):(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32 against `.debug_info' bazel-out/k8-dbg/bin/external/llvm-project/llvm/libScalar.a(FlattenCFGPass.pic.o):(.debug_aranges+0x6): additional relocation overflows omitted from the output collect2: error: ld returned 1 exit status` Adding -mcmodel=large makes no difference, as the overflow is in a debug section. I tried -gdwarf64 which is not supported by gcc some platform info: ``` root@7fe23091cb5b:/opt/tensorflow# gcc --version gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. root@7fe23091cb5b:/opt/tensorflow# uname -a Linux 7fe23091cb5b 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux ```

It was not available on the v2.4 tag.