In the past, I’ve had issues debugging in TensorFlow where the problem was somewhere in the C++ code base and I was using gdb, these included debug builds being too large (using -O0) and running out of space, recompile time etc. Does anyone have recommendations to handle debugging in TensorFlow?
4 Likes
Bhack
May 10, 2021, 11:55pm
#2
I think that some of these problems are well known. For a recent experience you can follow this:
opened 07:49PM - 05 May 21 UTC
closed 11:27PM - 22 Jun 21 UTC
stat:awaiting tensorflower
type:build/install
subtype: ubuntu/linux
subtype:bazel
when building opensource TensorFlow with
bazel build --config=dbg --config=… cuda --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" //tensorflow/tools/pip_package:build_pip_package
(for SM 7.0 only)
The build dies at link time with:
`ERROR: /home/baarts/tensorflow-GH/tensorflow/python/BUILD:3373:24: Linking of rule '//tensorflow/python:_pywrap_tensorflow_internal.so' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-dbg/bin/tensorflow/python/_pywrap_tensorflow_internal.so-2.params
bazel-out/k8-dbg/bin/external/llvm-project/llvm/libScalar.a(AnnotationRemarks.pic.o):(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32 against `.debug_info'
bazel-out/k8-dbg/bin/external/llvm-project/llvm/libScalar.a(BDCE.pic.o):(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32 against `.debug_info'
bazel-out/k8-dbg/bin/external/llvm-project/llvm/libScalar.a(CallSiteSplitting.pic.o):(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32 against `.debug_info'
bazel-out/k8-dbg/bin/external/llvm-project/llvm/libScalar.a(ConstantHoisting.pic.o):(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32 against `.debug_info'
bazel-out/k8-dbg/bin/external/llvm-project/llvm/libScalar.a(ConstraintElimination.pic.o):(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32 against `.debug_info'
bazel-out/k8-dbg/bin/external/llvm-project/llvm/libScalar.a(CorrelatedValuePropagation.pic.o):(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32 against `.debug_info'
bazel-out/k8-dbg/bin/external/llvm-project/llvm/libScalar.a(DCE.pic.o):(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32 against `.debug_info'
bazel-out/k8-dbg/bin/external/llvm-project/llvm/libScalar.a(DeadStoreElimination.pic.o):(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32 against `.debug_info'
bazel-out/k8-dbg/bin/external/llvm-project/llvm/libScalar.a(DivRemPairs.pic.o):(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32 against `.debug_info'
bazel-out/k8-dbg/bin/external/llvm-project/llvm/libScalar.a(EarlyCSE.pic.o):(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32 against `.debug_info'
bazel-out/k8-dbg/bin/external/llvm-project/llvm/libScalar.a(FlattenCFGPass.pic.o):(.debug_aranges+0x6): additional relocation overflows omitted from the output
collect2: error: ld returned 1 exit status`
Adding -mcmodel=large makes no difference, as the overflow is in a debug section.
I tried -gdwarf64 which is not supported by gcc
some platform info:
```
root@7fe23091cb5b:/opt/tensorflow# gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
root@7fe23091cb5b:/opt/tensorflow# uname -a
Linux 7fe23091cb5b 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
```
In the end there is a draft proposal so If you have something technical to share about your experience please leave a comment in the ticket.
3 Likes
Thanks, that does contain some useful info.
3 Likes
I found that the best way to debug is printf
-debugging without checking again from the head of the repository (because that would result in longer compile times again).
If possible, building with ASAN also helps. The OSSFuzz docker container allows that.
1 Like