Tensorflow 2 models on Jetson Nano

Hi,

I’m currently working on an AI project using an NVIDIA Jetson Nano (4GB) and TensorFlow 2 where we were planning on using a Faster R-CNN Inception ResNet V2 640x640 model. We tried using TF-TRT to reduce the network, but it seems to be too big to fit in, the vRAM memory is not big enough and using Swap doesn’t solve the issue.

We have done several tests and, for the moment, the heaviest network from the TensorFlow Model Zoo we managed to get working is the SSD MobileNet V2 FPNLite 640x640.

I’ve been searching for a list of networks that have been tested on this device for TF2, but I can’t seem to find it. I know of the existence of this list, but it is for TF1 and doesn’t involve the TF2 Model Zoo models.

Has anyone tried to get something bigger running on the Jetson? Which network and how? Is there any official documentation on which models are feasible for only 4GB?

Also, I’d like to understand why I can execute these models on my CPU but not on the GPU, as the system runs out of (RAM/vRAM) memory. I’m no expert and it seems a bit weird.

Thank you in advance.

The Nano is not designed to run large neural networks. I repeat: execute. The work must do this upstream and think lite

You can find model memory with:

https://tensorflow-prod.ospodiscourse.com/t/how-to-find-out-keras-model-memory-size/5249

You can try with smaller model and explore some model optimizations:

You can also play with the TF-TRT memory managment params:

1 Like

Hi @Francisco_Ferraz

Could you please share the steps as to reproduce your SSD MobileNet V2 FPNLite 640X640 execution on Jetson Nano? Have you been able to run it with TensorRT succesfully?

Thank you.

Hi, I’ve run it with TF-TRT, but not with pure TensorRT. This is the script I use for conversion:

import os
import numpy as np
import tensorflow as tf
from google import protobuf
from tensorflow.python.compiler.tensorrt import trt_convert as trt

print("Tensorflow version: ", tf.version.VERSION)
print("Protobuf version:", protobuf.__version__)
print("TensorRT version: ")
print(os.system("dpkg -l | grep TensorRT"))

gpu_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpu_devices[0], True)
tf.config.experimental.set_virtual_device_configuration(
            gpu_devices[0],
            [tf.config.experimental.VirtualDeviceConfiguration(
               memory_limit=1024)]) ## Crucial value, set lower than available GPU memory (note that Jetson shares GPU memory with CPU), should be 2048

conversion_params = trt.DEFAULT_TRT_CONVERSION_PARAMS
conversion_params = conversion_params._replace(max_workspace_size_bytes=(1<<30)) 
conversion_params = conversion_params._replace(precision_mode="FP16")
conversion_params = conversion_params._replace(maximum_cached_engines=10)
conversion_params = conversion_params._replace(use_calibration=True)
converter = trt.TrtGraphConverterV2(
    input_saved_model_dir="./trained_model/saved_model",
    conversion_params=conversion_params)
converter.convert() 

batch_size = 1
def input_fn():
    # Substitute with your input size
    Inp1 = np.random.normal(size=(batch_size, 1024, 1024, 3)).astype(np.uint8) 
    yield (Inp1, )
converter.build(input_fn=input_fn)

converter.save("./trained_model/saved_model_compressed_int8")