Object Detection - Tutorial Example using Model Garden

Want to know more about how Image Classification works with Model Garden, This tutorial fine-tunes a RetinaNet with ResNet-50 as backbone model from the TensorFlow Model Garden package (tensorflow-models) to detect three different Blood Cells in BCCD dataset. The RetinaNet is pretrained on COCO train2017 and evaluated on COCO val2017.

Check it out and share your feedback.


Hi, thanks for the tutorial. How do I export a trained model with a different input signature, a float32 instead of uint8? I also tried to convert the model to the latest 8.4 TensorRT but it didn’t work. Is there a working example of a TensorRT export?

Hi @alekseisolovev,

Please first go through this tutorial and execute. Once the model is trained and exported, save it to your local or google dirve(preferably). Now once go through this gist which uses INT8 for optimizing model using TensorRT. In the gist I used same dataset which is used in above tutorial, you can try with different batch sizes and image sizes. Make sure that exported model is exported with proper batch_size. The model is trained with tfrecords in the tutorial, so I presume the datatype in tf.uint8 while training so I used INT8 as my precision.

Currently Tensorflow nightly builds include TF-TRT by default, which means you don’t need to install TF-TRT separately. You can pull the latest TF containers from docker hub or install the latest TF pip package to get access to the latest TF-TRT.

Please go through tensorflow/tensorrt repo for more details.

Nvidia here provides different ways to install TensorRT. I used python installation which doesn’t require any extra installation. Please make sure you have proper nvidia TensorRT version and Cuda versions specified according to the documentation.

Colab has latest cuda driver 11.8.

Screenshot 2023-03-10 at 4.52.11 PM

I hope the explanation is clear and help you resolve the issue.


Hi! Thanks for the info! The problem with TF-TRT is that it creates a TF graph which requires installing TF on a Jetson device which I want to avoid. In other words I’m looking for a way to generate TRT engine, not a TF graph.

@alekseisolovev did you find a way to convert the saved_model to a proper TRT engine yet?

I’m also facing the same issue. I’m also attempting to convert the saved_model → .onnx → TRT, but I get

[03/20/2023-09:42:32] [W] [TRT] onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[03/20/2023-09:42:32] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[03/20/2023-09:42:33] [E] Error[3]: [topKLayer.h::setK::22] Error Code 3: API Usage Error (Parameter check failed at: /_src/build/aarch64-gnu/release/optimizer/api/layers/topKLayer.h::setK::22, condition: k > 0 && k <= kMAX_TOPK_K
[03/20/2023-09:42:33] [E] Error[2]: [topKLayer.cpp::TopKLayer::20] Error Code 2: Internal Error (Assertion ThreadContext::getThreadResources().getErrorRecorder().getNbErrors() == prevNbErrors failed. )
[03/20/2023-09:42:33] [E] [TRT] ModelImporter.cpp:726: While parsing node number 310 [TopK -> "StatefulPartitionedCall/generate_detections/TopKV2:0"]:
[03/20/2023-09:42:33] [E] [TRT] ModelImporter.cpp:727: --- Begin node ---
[03/20/2023-09:42:33] [E] [TRT] ModelImporter.cpp:728: input: "StatefulPartitionedCall/generate_detections/Reshape:0"
input: "const_fold_opt__1545"
output: "StatefulPartitionedCall/generate_detections/TopKV2:0"
output: "StatefulPartitionedCall/generate_detections/TopKV2:1"
name: "StatefulPartitionedCall/generate_detections/TopKV2"
op_type: "TopK"
attribute {
  name: "sorted"
  i: 1
  type: INT

[03/20/2023-09:42:33] [E] [TRT] ModelImporter.cpp:729: --- End node ---
[03/20/2023-09:42:33] [E] [TRT] ModelImporter.cpp:731: ERROR: builtin_op_importers.cpp:4931 In function importTopK:
[8] Assertion failed: layer && "Failed to add TopK layer."
[03/20/2023-09:42:33] [E] Failed to parse onnx file
[03/20/2023-09:42:33] [E] Parsing model failed
[03/20/2023-09:42:33] [E] Failed to create engine from model or file.
[03/20/2023-09:42:33] [E] Engine set up failed

Hi, @Gling_K. With the latest TensorRT 8.5 I’m having the same TopKLayer issue.

Hi @Gling_K, @alekseisolovev,

Can you provide a code snippet to reproduce the issue, so that I can take a look at it and if possible work on resolving the issue.


Hi @Siva_Sravana_Kumar_N ,
Here are the steps I did:

  1. Use the object detection colab and ran until Saving and exporting the trained model. section and then zip & download the export model to my local machine.

  2. On my local machine that has tf2onnx installed and ran the following command in terminal:

python -m tf2onnx.convert --saved-model tensorflow-model-path --output test.onnx
  1. On local machine (Orin AGX) , using nvcr.io/nvidia/l4t-tensorrt:r8.5.2.2-devel docker image and through bash inside the docker container I ran:
/usr/src/tensorrt/bin/trtexec --onnx='/workspaces/scratch_ai/onnx/test.onnx' --saveEngine='/workspaces/scratch_ai/trt/test.engine' --exportProfile='/workspaces/scratch_ai/trt/test.json' --allowGPUFallback --useSpinWait --separateProfileRun > '/workspaces/scratch_ai/trt/test.log'

which led me to the error reported.

It would be nice if there is a way to make the models in tf model garden play nice with TensorRT without the need to use TF-Trt, as in my tests plain tensorrt engines run much faster than tf-trt models

Any help is appreciated

I’ve tried running that example on Google Colab and it fails: specifically, the statement

import tensorflow_models as tfm

produces the error, “ImportError: numpy.core.multiarray failed to import”. Any possible reason why it fails on Colab?

Hi @Timothy_Tuti,

Once you pip install cells are done, please restart the runtime. It’s because new numpy version update has been done and it requires a restart. I hope this helps

It did help, thanks. I decided to run the same example on my laptop with the following specs:

OS: Ubuntu 22.04
CUDA Version: 11.8
cuDNN: 8.6
GPU: GeForce RTX 3050 Ti Mobile
RAM: 32 GB
CPU: 12th Gen Intel(R) Core™ i7-12700H
Cores: 20

But I keep getting the error

Profiling failure on CUDNN engine eng0{}: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 16777216 bytes.

"Is there any possibility of tweaking the code from the example here Object detection with Model Garden  |  TensorFlow Core so as to not run of memory when running it on the local machine?


You can try reducing batch size or try with different variant of model like retinanet_mobile_coco which can be light weight when compared to the one used in tutorial.

Using the object detection tutorial example from model garden with my custom images all of which have a height and width of 1024 and 800 respectively, the outcome of the code snippet below is images.shape: (4, 1024, 896, 3) images.dtype: tf.float32

for images, labels in task.build_inputs(exp_config.task.train_data).take(1):
    print(f'images.shape: {str(images.shape):16}  images.dtype: {images.dtype!r}')
    print(f'labels.keys: {labels.keys()}')

and I end up getting the following error message when I run the model…

ValueError: Input 0 of layer "res_net" is incompatible with the layer: expected shape=(None, 1024, 800, 3), found shape=(4, 1024, 896, 3)

Why do the images get reshaped by the task.build_inputs(exp_config.task.train_data) despite me having explicitly specified exp_config.task.model.input_size = [1024, 800, 3] ? How can I ensure the dimensions of the images from the exp_config.task.train_data are same as the exp_config.task.model.input_size ?

If you specify exp_config.task.model.input_size = [1024, 800, 3] the build_inputs will do all the preprocessing steps required. In this case it’s image size.

Also make sure to provide correct batch size in exp_config


Thanks for the response. That is exactly my problem: I have specified exp_config.task.model.input_size = [1024,800,3] but the output from build_inputs keeps generating training images whose dimensions are [1024,896,3] despite all the raw images being [1024,800,3]. How can that be? Is there an image processing step that causes this? When I view the resultant images, they have black padding added to the right.

can you check the parse configuration in the exp_config of train data where there pre defined augmentation techniques that are used to make the data more robust for training.

I faced a similar issue. The TensorRT documentation states that the TopK layer needs k <= 1024, whereas the default for the object detection models is 5000.

Exporting the model with pre_nms_top_k = 1000 seemed to solve the complaint about TopKLayer for me.

1 Like

I checked the retinanet_resnetfpn_coco default configuration defined in official/vision/configs/retinanet.py and I noticed that it uses transfer learning, loading the backbone from a certain checkpoint:


The traning and inference can run without errors if I try different executions with different values of exp_config.task.model.input_size, but I’m not sure what input size to choose. Should I always set the exp_config.task.model.input_size to match the input size that the pretrained backbone was trained on ?

The input size can be whatever, so long as the width and height are both divisible by the corresponding powers of 2 specified in min_level and max_level. The backbone does not require a specific input size and will be fine-tuned to your specific input images in any case, during training.


Haiii i have tried the tutorial and got my tensorboard can’t load. The error says: Google 403 That’s an error, that’s all we know. How to solve this? Thanks