Undefined references to _mlir_ciface* symbols

Hello,

I am trying to compile libtensorflow_cc.so version 2.14.0 on Arch linux.
At the end the linking fails with errors like:

...
/usr/bin/ld: bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_cast_op.pic.lo(gpu_op_cast.pic.o): in function `tensorflow::(anonymous namespace)::MlirCastGPUDT_INT64DT_INT16Op::Invoke(tensorflow::OpKernelContext*, llvm::SmallVectorImpl<tensorflow::UnrankedMemRef>&)':
gpu_op_cast.cc:(.text._ZN10tensorflow12_GLOBAL__N_129MlirCastGPUDT_INT64DT_INT16Op6InvokeEPNS_15OpKernelContextERN4llvm15SmallVectorImplINS_14UnrankedMemRefEEE+0x10): undefined reference to `_mlir_ciface_Cast_GPU_DT_INT64_DT_INT16'
/usr/bin/ld: bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_cast_op.pic.lo(gpu_op_cast.pic.o): in function `tensorflow::(anonymous namespace)::MlirCastGPUDT_INT64DT_INT32Op::Invoke(tensorflow::OpKernelContext*, llvm::SmallVectorImpl<tensorflow::UnrankedMemRef>&)':
gpu_op_cast.cc:(.text._ZN10tensorflow12_GLOBAL__N_129MlirCastGPUDT_INT64DT_INT32Op6InvokeEPNS_15OpKernelContextERN4llvm15SmallVectorImplINS_14UnrankedMemRefEEE+0x10): undefined reference to `_mlir_ciface_Cast_GPU_DT_INT64_DT_INT32'
/usr/bin/ld: bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_cast_op.pic.lo(gpu_op_cast.pic.o): in function `tensorflow::(anonymous namespace)::MlirCastGPUDT_INT64DT_INT64Op::Invoke(tensorflow::OpKernelContext*, llvm::SmallVectorImpl<tensorflow::UnrankedMemRef>&)':
gpu_op_cast.cc:(.text._ZN10tensorflow12_GLOBAL__N_129MlirCastGPUDT_INT64DT_INT64Op6InvokeEPNS_15OpKernelContextERN4llvm15SmallVectorImplINS_14UnrankedMemRefEEE+0x10): undefined reference to `_mlir_ciface_Cast_GPU_DT_INT64_DT_INT64'
...

a few hundred of these.

All 649 error lines containing ‘undefined reference to’ are of the following form:

^gpu_op_[a-z0-9_]*\.cc:(\.text\._Z[^+]*+0x[0-9a-f]*): undefined reference to `_mlir_ciface_[A-Za-z0-9_]*.$

showing that all undefined references come from files with a name like gpu_op_[a-z0-9_]*\.cc.
All of which exclusively exist in build/tensorflow/tensorflow/core/kernels/mlir_generated/.

After further investigation, it seems that the problem comes from the use of macros that use the macro MLIR_FUNCTION defined in tensorflow/tensorflow/core/kernels/mlir_generated/base_op.h:

#define MLIR_FUNCTION(tf_op, platform, input_type, output_type) \
  _mlir_ciface_##tf_op##_##platform##_##input_type##_##output_type

and well in particular the macros
GENERATE_UNARY_KERNEL3, GENERATE_BINARY_KERNEL3 and GENERATE_TERNARY_KERNEL3 which are more or less similar, so l lets just look at one:

#define GENERATE_UNARY_KERNEL3(tf_op, platform, input_type, output_type, casted_input_type, casted_output_type)

which produces code like (I did some formatting):

extern "C" void MLIR_FUNCTION(tf_op, platform, input_type, output_type)              // <-- Undefined reference.                                 
    (UnrankedMemRef * result, OpKernelContext * ctx, UnrankedMemRef * arg);     
                                                                              
namespace {                                                                   
                                                                              
class MLIR_OP(tf_op, platform, casted_input_type, casted_output_type) :                                                                          
    public MLIROpKernel<output_type, typename EnumToDataType<output_type>::Type, casted_output_type>
{                                                                             
 public:                                                                        
  using MLIROpKernel::MLIROpKernel;
                                                                                
  UnrankedMemRef Invoke(OpKernelContext* ctx, llvm::SmallVectorImpl<UnrankedMemRef>& args) override
  {
    UnrankedMemRef result;                                                                           
    MLIR_FUNCTION(tf_op, platform, input_type, output_type)(&result, ctx, &args[0]);   // <-- Undefined reference.
    return result;
  }                                                                                                                                 
};                                                                            
                                                                              
} // namespace 

Where should these symbols have been defined? For example, which bazel target (some .lo or .a file) should have _mlir_ciface_Cast_GPU_DT_INT64_DT_UINT8 defined (to pick a random one)?

I checked ALL object files and archives that are being linked, and in my case only the following mention _mlir_ciface symbols:

bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_nextafter_op.pic.lo
bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_relu_op.pic.lo
bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_softplus_op.pic.lo
bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_softsign_op.pic.lo
bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_constant_op.pic.lo
bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_cast_op.pic.lo
bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_cwise_unary_op.pic.lo
bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_cwise_binary_op.pic.lo
bazel-out/k8-opt/bin/tensorflow/compiler/mlir/tools/kernel_gen/libtf_framework_c_interface.pic.a
bazel-out/k8-opt/bin/tensorflow/compiler/mlir/tools/kernel_gen/libtf_gpu_runtime_wrappers.pic.a
bazel-out/k8-opt/bin/external/llvm-project/mlir/lib_mlir_runner_utils.pic.a
bazel-out/k8-opt/bin/external/llvm-project/mlir/lib_mlir_c_runner_utils.pic.a

The undefined symbols are all UND (undefined) coming from the bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_*.pic.lo files. The other files define some _mlir_ciface symbols but not the ones that are missing.

Please help.

EDIT: I managed to compile and link 2.13.0 and it turns out that 2.14.0 isn’t linking with any of the bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/lib*_kernel_generator.pic.a files. Those archives have been created, but they aren’t linked-- hence the undefined references.

I am totally new to bazel, so any help with figuring out what the problem is is appreciated.
I’d like to add the [build] tag to this post, but I can’t figure out where/how I can do that.

It seems that it is working with bazel version 6.1.0, but not with 6.4.0 that I was using (the current version of Arch bazel).

However, now I have the problem that the target //tensorflow/tools/pip_package:build_pip_package doesn’t work; it gives me:

ERROR: /opt/home_carlo/dot_cache/bazel/_bazel_carlo/1b82455ffb4023ee1d91ebc8e01e1cda/external/pybind11/BUILD.bazel: no such target '@pybind11//:osx': target 'osx' not declared in package '' defined by /opt/home_carlo/dot_cache/bazel/_bazel_carlo/1b82455ffb4023ee1d91ebc8e01e1cda/external/pybind11/BUILD.bazel (Tip: use `query "@pybind11//:*"` to see all the targets in that package)
ERROR: /opt/home_carlo/dot_cache/bazel/_bazel_carlo/1b82455ffb4023ee1d91ebc8e01e1cda/external/ml_dtypes/BUILD.bazel:22:17: errors encountered resolving select() keys for @ml_dtypes//:_custom_floats.so
ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted

which is weird if only because I’m on linux - not OSX.

1 Like

I solved all problems. But this forum seems pretty dead, so yeah. In fact it seems that tensorflow itself is pretty much dead unfortunately.

Hi @Carlo_Wood, I’m facing the same issues as what you mentioned above. Could you mention what all you ended up doing?

Thanks in advance.