.so footprint reduction not working (2.6.0)

Hi. (I already posted similar question on stack overflow, but reposting here for visbility).

I am following instructions here: Reduce TensorFlow Lite binary size.

Tensorflow version: 2.6.0, and i am running everything in a devel-cpu docker container as found in tensorflow/tools/dockerfiles (which is, worth noting, ubuntu 18.04 image). Here is my BUILD file:

load(
    "//tensorflow/lite:build_def.bzl",
    "tflite_custom_cc_library",
    "tflite_cc_shared_object",
)

load(
    "//tensorflow/lite/delegates/flex:build_def.bzl",
    "tflite_flex_cc_library",
)

tflite_flex_cc_library(
  name = "selective_cc_flex",
  models = [
     ":saved_model.tflite",
  ],
)

tflite_custom_cc_library(
    name = "selective_cc_builtin",
    models = [
        ":saved_model.tflite",
    ],
)

tflite_cc_shared_object(
    name = "tensorflowlite",
    # Until we have more granular symbol export for the C++ API on Windows,
    # export all symbols.
    features = ["windows_export_all_symbols"],

    linkopts = select({
        "//tensorflow:macos": [
            "-Wl,-exported_symbols_list,$(location //tensorflow/lite:tflite_exported_symbols.lds)",
        ],
        "//tensorflow:windows": [],
        "//conditions:default": [
        "//conditions:default": [
            "-Wl,-z,defs",
            "-Wl,--version-script,$(location //tensorflow/lite:tflite_version_script.lds)",
        ],
    }),
    per_os_targets = True,
    deps = [
        ":selective_cc_builtin",
        "//tensorflow/lite:tflite_exported_symbols.lds",
        "//tensorflow/lite:tflite_version_script.lds",
   #        ":selective_cc_flex",
   #     "//tensorflow/lite/delegates/flex:exported_symbols.lds",
   #     "//tensorflow/lite/delegates/flex:version_script.lds",

    ],
)

Which for convenience includes both built-in and flex selective build in one file, but otherwise is more or less identical to what happens in the instructions. I just change the final dependency if i want to try one or the other.

I tried both with exactly the same outcome: no matter what library i copy for the saved_model.tflite, i get exactly the same tensorflowlite.so binary size.

For example, if I compile for --config=elinux_armhf, built-in operations, I always get a binary of 3,198,770, no matter which model i use for saved_model.tflite. If I use, for example, SPICE model from the same page, i still get this over 3Mb size, whereas the same instruction page advertises at the moment 376kb aar for Android. It is not quite the same, but it would have the jni version of the same .so inside it, which, if I read TF bazel build correctly, more or less follows the same path of operation filtering.

Similarly, if I build for flex (aka SELECT TF) ops, i get north of 200 Mb(!) binaries on x86, whereas advertised size for android is ~1.7kb, give or take. I get similarly gargantuan footprints for elinux_armhf and elinux_aarch64 builds.

What am I doing wrong? Or perhaps my expectations are wrong? If so, then i don’t see a difference between selective build and a regular build.

Thank you.

I am currently doing a similar thing I might try to compile tommorow. I tried to build natively on windows a week ago with no success, then I put it off for doing other stuff.

Also please look into tensorflow/build_android.md at master · tensorflow/tensorflow (github.com)

Will let you know with my results soon.

maybe it is a matter of linker optoins. --as-needed etc?

@D_L Sorry I forgot to notify, but I did try to compile, it errored out after 7.5hrs for a totally unrelated reason, I didn’t try after that. I was using :devel tag docker image. were you using the same one?

Also I was just using the .build_aar.sh file for the build. I have no idea if 7hr build time is normal or not for building specific tf ops just for one architecture (arm46)

it should be like 20 min on 16 vm cores (or 8 real cores). Almost instantaneously after that if the dev image mounts bazel repo and disk cache and the correspondent options are used in the bazel build command.

yes i use cpu-devel image from 2.6.0 for the build of course. I used to do it without docker but the interference is too great, i have been doing it exclusively on the docker container with mounted volumes for sources and caches now for some time.

1 Like

Yeah I was running it on 4 core, 8 threads vm but 7 hour is way too much even for that, I absolutely have no idea why it took that long. I will try again with the image you were talking about, I think I just tried the most recent devel tagged image that time, it would be nice if you link the image.