Training a SpaghettiNet model

Isaac_Padberg · March 31, 2022, 7:18pm

Hi!

The SpaghettiNet-EdgeTPU model is a new Object Detection model that has been mentioned in a few different places on the Google Ai Blog and in the Tensorflow Object Detection API github.

I downloaded SpaghettiNet model, ran it on the Pixel 6 with the NNAPI delegate, and holy hell this thing is fast!

I’m not sure how to go about training a model like this , so I am looking for some guidance.

If anybody has done this before I would greatly appreciate some insight.

macd · April 1, 2022, 2:52am

The blog article you linked mentions the SpaghettiNet model was trained using neural architecture search. There are a few resources around covering that topic, like this list of tools.

For a quick-and-easy first try, a commercial product like Google Cloud’s AutoML might help? Though I’m not sure if you can provide the kind of controls they talked about in the article (e.g. moving the compute budget to different parts of the network), or optimising for latency. Some of the solutions allow export to “mobile optimised” models though.

Isaac_Padberg · April 1, 2022, 3:09pm

I was hoping I could just use TensorFlows Object Detection API along with this SpaghettiNet config file. I tried this using TF2 but it says that this model is not compatible with TF2. I’m now trying to use TF1… bad idea?

lgusm · April 4, 2022, 11:21am

Isaac, to you want to train the model or do transfer learning/fine-tunnig for your own data?

Isaac_Padberg · April 4, 2022, 12:49pm

I want to do transfer learning for my own data. However, I just got done training this model from the pipeline.config and have a saved_model.pb file. I’m pretty sure this is just training from scratch and has no transfer learning involved(correct me if I’m wrong). It was quite annoying to get everything properly configured, but I only needed to succeed with this configuration one time.

Configuration:
Python: 3.7.0
Tensorflow: 1.15
Pycocotools: 2.0.4
Numpy: 1.18.5
Using model_main.py from the Object Detection API

I’m working on converting this to a tflite model now.

lgusm · April 4, 2022, 2:05pm

TF 1.x?

There are some new spaghettiNet models on TFHub (eg: TensorFlow Hub ) I don’t know if those are TF 1.x… I’ll try to find out

muxamilian · April 7, 2022, 11:00am

Could you get it to run with TF 2.x?

Isaac_Padberg · April 7, 2022, 12:45pm

It won’t run with TF 2.x unfortunately. I wrote up a little walk through on github for how to make it work on TF 1.15.

Marie_White · April 26, 2022, 6:17pm

@Isaac_Padberg excellent write-up!

The model can be trained using the Object Detection API, as @Isaac_Padberg has demonstrated. Although the architecture was derived using AutoML, one can train it just like any other model using Tensorflow + ODAPI with some changes to the config file according to your use case. It trains from scratch.

To get it to run with TF 2.x, the feature extractor code will need to be rewritten to use TF2, but only floating point inference will work. We use TF1 because it relies on some quantization-aware training features that are not yet ported to TF2. If you would like to train without quantization, simply remove the graph_rewriter section from the pipeline.config file.

If training with quantization, the rule of thumb is to set the quantization delay parameter in the graph_rewriter config to ~10% of the total training steps (num_steps in the config). We include this delay to stabilize training but it should spend most of its time training with quantization so that it can adjust to the reduced precision.

When converting to TFLite, use mean and std values of 128 since the input is UINT8. The model is quantized until the very last operation. Here, a TFLite Detection_PostProcess custom op takes in the UINT8 model output and dequantizes it, runs NMS, etc. and outputs results in floating point.

Isaac_Padberg · April 26, 2022, 7:05pm

Thanks for sharing that Marie.

When converting to TFLite, I have run into an issue with the Relu_6 operation. It does not have a specified range, so you must include one via the --default_ranges_max/min flag. I’ve set mine to 0 and 6 respectively, but I’m wondering if this will cause issues with accuracy and if there is something else I should be doing instead. Thanks!

Marie_White · April 26, 2022, 7:48pm

A range of 0 to 6 makes sense for Relu6 but it may incur some precision loss. Exactly how much will depend on model training, dataset, domain, location of the Relu6, etc. I suggest running the model end-to-end with a large scale dataset and compare the results to the floating point version of the model.

Ideally, that Relu6 should have min/max ranges attached to it. We strategically place fake_quant_with_min_max_vars ops in the model to keep track of these ranges. The code that does this is here. What you want to do is get the name of the Relu6 op that’s missing ranges and call InsertQuantOp on it similar to line 124 of quantize.py.

muxamilian · May 2, 2022, 4:26pm

Thanks for the explanation. I’m having the same issue and it seems others as well: export_spaghettinet_to_tflite_edgetpu.ipynb · GitHub

What would be the simplest solution without changing tensorflow.contrib itself? I guess the issue is that the nodes FeatureExtractor/spaghettinet_edgetpu_l/Relu6 are not matched by _FindLayersToQuantize.

Command: tflite_convert --output_file="$OUTPUT_DIR/model.tflite" --graph_def_file="$OUTPUT_DIR/tflite_graph.pb" --inference_type=QUANTIZED_UINT8 --input_arrays="normalized_input_image_tensor" --output_arrays="TFLite_Detection_PostProcess,TFLite_Detection_PostProcess:1,TFLite_Detection_PostProcess:2,TFLite_Detection_PostProcess:3" --mean_values=128 --std_dev_values=128 --input_shapes=1,320,320,3 --allow_custom_ops

Output: 2022-05-02 13:52:04.556555: F tensorflow/lite/toco/tooling_util.cc:1728] Array FeatureExtractor/spaghettinet_edgetpu_l/Relu6, which is an input to the Conv operator producing the output array FeatureExtractor/spaghettinet_edgetpu_l/spaghetti_net/c0n0_0/expansion/Relu6, is lacking min/max data, which is necessary for quantization. If accuracy matters, either target a non-quantized output format, or run quantized training with your model from a floating point checkpoint to change the input graph to contain min/max information. If you don't care about accuracy, you can pass --default_ranges_min= and --default_ranges_max= for easy experimentation.

Isaac_Padberg · May 3, 2022, 12:07pm

tflite_convert --graph_def_file=$OUTPUT_DIR/tflite_graph.pb 
--output_file=$OUTPUT_DIR/spaghetti.tflite 
--input_shapes=1,320,320,3 
--input_arrays=normalized_input_image_tensor 
--output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3' 
--inference_type=QUANTIZED_UINT8 
--mean_values=128 
--std_dev_values=128 
--change_concat_input_ranges=false 
--allow_custom_ops 
--default_ranges_min=0 <-- Add this line 
--default_ranges_max=6 <-- Add this line

muxamilian · May 3, 2022, 12:28pm

Thanks, I tried this before and it worked. But that probably causes a loss in accuracy since it assumes that activations are equally distributed between 0 and 6 as @Marie_White mentioned.

I think that there is an underlying issue in the quantization that this layer (FeatureExtractor/spaghettinet_edgetpu_l/Relu6) is not matched and therefore no quantization node is added.

With my post before I wanted to show which layer exactly is not quantized. Probably one would have to adapt the spaghettinet code or the quantization code in tensorflow.contrib. Unfortunately I don’t have the time to investigate further at the moment.

Isaac_Padberg · May 3, 2022, 12:50pm

Ahh yes, I thought you might have already understood the solution. I’ve trained a model and got it running with pretty good results. Of course, the results are probably a bit worse than what the real solution would offer.

Hao_Xu · May 4, 2022, 4:58pm

Please take a look at the pull request in Adding quantization rewrite rules for spaghettinet by hermitman · Pull Request #55919 · tensorflow/tensorflow · GitHub

SpaghettiNet has a unique structure that will have a standalone activation function in the branch merging part of the network. This needs to be specially handled.

I cannot unit test so it would be better if someone here can help confirm that this fixes it Thanks for discovering all the issues in the open sourced models and hopefully we can help fixing those.

muxamilian · May 4, 2022, 5:02pm

After digging further, I found out that the behavior that we encountered seems to be intended:

In quantize.py (line 294) quantization is skipped if an activation function immediately follows an addition or multiplication. When skipping an op for this reason during quantization, tensorflow outputs this:
INFO:tensorflow:Skipping quant after FeatureExtractor/spaghettinet_edgetpu_l/add_13

In our example, the activation, which is missing range information, (FeatureExtractor/spaghettinet_edgetpu_l/Relu6) immediately follows an addition.

I’m not quite sure why it is beneficial to skip quantization in these cases but I hope there is a good reason for doing so. If anyone (like @Marie_White) can explain that, I’d appreciate it.

Marie_White · May 4, 2022, 6:48pm

The code in quantize.py (line 294) was written to add FakeQuantOps after Mul and Add operations, rather than Relu6.

It is intended to rewrite the pattern:

        / - Mul - \                
Relu6 -|           |- Add 
        \ - Mul - /

To:

        / - Mul - FakeQuant - \                
Relu6 -|                       |- Add - FakeQuant
        \ - Mul - FakeQuant - /

The check on line 294 was written to avoid scenarios like this:

Mul|Add - FakeQuant - Relu6

Where a Mul-Relu6 or Add-Relu6 are likely fused together in TFLite.

For SpaghettiNet, the patch that @Hao_Xu added addresses this pattern:

Conv -> Relu6 -\
                | - Add - Relu6 - Conv
Conv -> Relu6 -/

And rewrites it to:

Conv -> Relu6 - FakeQuant -\
                            | - Add - Relu6 - FakeQuant - Conv
Conv -> Relu6 - FakeQuant -/

muxamilian · May 5, 2022, 2:25pm

Thanks for the patch!

Your patch adds the missing quantization nodes but also adds quite some additional quantization nodes to the training graph (only the training graph, not the evaluation graph), which I think are not required (see my comment on your patch).

I changed the code to only include the required quantization nodes but I’m not 100% sure how generalizable my fix is.

Hossein_Dehghanipour · July 6, 2022, 9:15am

Hi @Marie_White and @Isaac_Padberg. Thanks for the guideline you wrote. Have you been able to do a transfer learning on the SpaghettiNet? I am trying to do a transfer learning on the model based on my own data but the model format is in “tflite” and not “h5”, which makes me unable to access the weights and layers. Any solutions or suggestions for this?