Load AND save model+weights TF2 C++ API

GosuPaper · April 28, 2022, 11:39am

Hi everyone,

I am building a C++ application including Tensorflow 2.6 in the aim to do classification or detection.

I managed to install the C++ API from source using Bazel.

I then started with classification. I managed to do training and prediction. It can tells if it is a dog or a cat most of the time which was already a great achievement starting from 0.

My problem now is to save and load this model.
I have tried to used WriteBinaryProto and ReadBinaryProto on .pb file but from my understanding, it only save the “architecture” of the model, like his composition ?
I have read about stuff concerning freezing a model to save the weights or I don’t really know what precisely… The trained part I assume. (If someone can clarify it would be appreciated indeed).
But freezing model seems to be in the past, at least for TF2 with python.

So I am not sure, is freezing a model still the way in TF2 using C++ API ? If not, can someone describe what i should do or at least give me ideas/explanations on how it works now ? I am a bit dry on this one.
I also read about checkpoint or something like that but did not manage to grasp it and use it.
To conclude, I also saw stuff about tensorflow::ops::Restore and tensorflow::ops::Save. But again no example and I am having trouble to make it works.
In the end, I find myself with 3 ideas but nothing that I managed to use haha.

Thank you for your help and ideas.

Bhack · April 29, 2022, 2:21pm

Why you need to train the model in c++?

If I remember correctly we have only saved_model loader API in c++:

GosuPaper · May 2, 2022, 6:08am

I need to do as much as possible in C++ because we need a compiled solution so end-user can’t play with code.

I have found example of CNN trained and saved with the C++ API in TF1.x using frozen graph and checkpoint. From my understanding, frozen graph are not the way in TF2.x but checkpoint might be. Anyway, I don’t see why the possibility to save model and parameters would have been totaly removed. There has to be a solution.

Bhack · May 2, 2022, 10:40am

It was already suggest that TF Is not a train ready c++ library:

https://tensorflow-prod.ospodiscourse.com/t/discuss-pros-cons-between-tensorflow-core-and-tensorflow-js-please/8138/30?u=bhack

What is your real problem here? Do you need to obfuscate your training code to the customer?

Bhack · May 2, 2022, 10:49am

If you need on device finetuning instead you could use:

See also our thread at:

GosuPaper · May 2, 2022, 11:26am

Yes I know that ! Except it is… I manage this morning by cheating a bit eheh.

As i said, i read about freezing model in previous version of TF1.X. This is not working anymore as it appears it is not even included in the “build from source” way. BUT I have changed a bit the header and the corresponding source file I found in the git repository (seems like they are not built but still there in git repo TF2.x) and instead of trying to build them from source… I have added those 2 directly to my C++ project.

Surprise surprise, it works. I can do train, save, load and inferences without any trouble using only C++.

Still, as a developer, it feels a lot like cheating and that can’t be a good practice… Like no way wtf. I can’t believe they removed such important feature that was working. And certainly not without allowing to do it an other way… Would be very odd.

Of course, I am still listening to any proposition/solution that could load and save using TF2.x C++ library without tricks !

Anyway, thank for your help Bhack. Was not what I was looking for in this case but it is very nice of you to give me other possibilities. Oh and… Yes, obfuscating is a possibility but we are not very confident in the security it provide…

markdaoust · May 2, 2022, 1:16pm

Ah, that makes more sense.

If you can load a Saved Model and run a particular signature, that should be all you need to make this work in a less hacky way. Follow that “On-Device training” tutorial, and just skip the “convert to tensorflow-lite” part. In python you build a model with signatures like “initialize”, “train_step”, “save”, “load”, “inference” and then in your target environment you call those as needed.

GosuPaper · May 2, 2022, 1:48pm

From what I am seeing in the tutorial, it is using checkpoint. It is a solution i tried without success… I did not manage to make it with C++.
I found something like that during my research which seems pretty close to the tutorial…

// save
tensorflow::Tensor checkpointPathTensor(tensorflow::DT_STRING, tensorflow::TensorShape());
checkpointPathTensor.scalar<std::string>()() = "some/path";
tensor_dict feed_dict = {{graph_def.saver_def().filename_tensor_name(), checkpointPathTensor}};
status = sess->Run(feed_dict, {}, {graph_def.saver_def().save_tensor_name()}, nullptr);

// restore
tensorflow::Tensor checkpointPathTensor(tensorflow::DT_STRING, tensorflow::TensorShape());
checkpointPathTensor.scalar<std::string>()() = "some/path";
tensor_dict feed_dict = {{graph_def.saver_def().filename_tensor_name(), checkpointPathTensor}};
status = sess->Run(feed_dict, {}, {graph_def.saver_def().restore_op_name()}, nullptr);

saver_def.filename_tensor_name is supposed to be the name of the tensor you must feed with a filename when saving/restoring.
saver_def.restore_op_name is supposed to be the name of the target operation you must run when restoring.
saver_def.save_tensor_name is supposed to be the name of the target operation you must run when saving.
But something was not working, no .ckpt files were created. Maybe I should try to replace the op_name with the signatures you suggested… I don’t know because I did not found any tips on this matter. I will try but with little hope haha.

craigacp · May 3, 2022, 3:18pm

We use the C API to save models in TF-Java, but it’s quite involved. You can trace things through from here which is our top level save method - java/SavedModelBundle.java at master · tensorflow/java · GitHub.

Bhack · May 3, 2022, 3:49pm

Thanks, I suppose we are near to @markdaoust’s advice :

gammatrix5 · August 19, 2022, 12:34am

gist.github.com

https://gist.github.com/asimshankar/7c9f8a9b04323e93bb217109da8c7ad2

README.md

# Training [TensorFlow](https://www.tensorflow.org) models in C

Python is the primary language in which TensorFlow models are typically developed and trained.
TensorFlow does have [bindings for other programming languages](https://www.tensorflow.org/api_docs/).
These bindings have the low-level primitives that are required to build a more complete API, however, lack
much of the higher-level API richness of the Python bindings, particularly for defining the model structure.

This gist demonstrates taking a model (a TensorFlow graph) created by a Python program and running the training loop in a C program.

## The model

This file has been truncated. show original

model.py

import tensorflow as tf

# Batch of input and target output (1x1 matrices)
x = tf.placeholder(tf.float32, shape=[None, 1, 1], name='input')
y = tf.placeholder(tf.float32, shape=[None, 1, 1], name='target')

# Trivial linear model
y_ = tf.identity(tf.layers.dense(x, 1), name='output')

# Optimize loss

This file has been truncated. show original

train.c

// Example of training the model created by model.py using the TensorFlow C API.
//
// To run use c.sh in this directory.

#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <unistd.h>

This file has been truncated. show original

There are more than three files. show original

ODT_novice · August 18, 2023, 2:42am

Dear markdaoust,
I am trying to implement on-device training by invoking my train.tflite using tensorflowlite_jni.so.
I added these two libraries in the CMakeLists.txt file:
add_library( tensorflowlite_jni SHARED IMPORTED )
set_target_properties( tensorflowlite_jni PROPERTIES IMPORTED_LOCATION ${JNI_DIR}/${ANDROID_ABI}/libtensorflowlite_jni.so )

add_library( tensorflowlite_flex_jni SHARED IMPORTED )
set_target_properties( tensorflowlite_flex_jni PROPERTIES IMPORTED_LOCATION ${JNI_DIR}/${ANDROID_ABI}/libtensorflowlite_flex_jni.so )

I used the following command to invoke the train signature in my train.tflite file:
TfLiteSignatureRunnerInvoke(train_model_info.signature_info[2].runner);

However, I encountered the following error:
Select TensorFlow op(s), included in the given model, is(are) not supported by this interpreter.
Make sure you apply/link the Flex delegate before inference.
For the Android, it can be resolved by adding “org.tensorflow:tensorflow-lite-select-tf-ops” dependency.
Node number 1409 (FlexBroadcastGradientArgs) failed to prepare.

I am unable to use libtensorflowlite_flex_jni.so to support this operation.
If I directly remove libtensorflowlite_jni.so from my project, I encounter the following error:
undefined reference to `TfLiteSignatureRunnerInvoke’

I would like to know how to run these two libraries together if I want to use the C API for on-device training. How can I make libtensorflowlite_flex_jni.so support the operations that libtensorflowlite_jni.so does not support?

Thank you.