Object detection using TensorFlow.js - Model.Predict() output interpretation

Hello guys ,

I’m trying to implement a web application for object detection on images using TensorFlow.js. I’ve trained my model and loaded it using “tf.loadLayersModel()”. Then I made predictions using “model.predict()” function. The problem is that I don’t understand how to interpret “predict” which is model.predict() output.

The output from "console.log(predict); " is :

Array(3) [ {…}, {…}, {…} ]

​0: Object { kept: false, isDisposedInternal: false, dtype: “float32”, … }

​1: Object { kept: false, isDisposedInternal: false, dtype: “float32”, … }

​2: Object { kept: false, isDisposedInternal: false, dtype: “float32”, … }

​length: 3​

The output from “console.log(predict[0].dataSync())” is :

Float32Array(18) [ -0.933699369430542, 2.0597338676452637, -8.093562126159668, -1.653637170791626, 29.91468048095703, -2.5104284286499023, -1.6961907148361206, 5.431970596313477, -8.678093910217285, 1.1465117931365967, … ]

The output from “console.log(predict[1].dataSync())” is :

Float32Array(72) [ -3.040828227996826, 19.110309600830078, -0.057641081511974335, -1.872375249862671, -124.70093536376953, 2.4474217891693115, 11.702408790588379, -9.773043632507324, -3.7172935009002686, 5.327491283416748, … ]

The output from “console.log(predict[2].dataSync())” is :

Float32Array(288) [ -36.274513244628906, -19.568172454833984, 12.820204734802246, -21.152021408081055, -225.57720947265625, 1.2267011404037476, 6.923470497131348, 26.43770408630371, 2.0712578296661377, -11.071444511413574, … ]

The aim is to get boxes location.

Thanks !

1 Like

Welcome to the forum. Which model are you loading exactly? Typically whoever wrote that model architecture you are using will be able to tell you how to interpret the tensor output of the model. Usually there is some level of documentation if you are retraining an existing model somewhere as to what the inputs/outputs should be. As you correctly discovered you need to read the data from the resulting returned Tensor else you will get Object printed as you saw above.

Typically for multibox detection the output of the model is a prediction map. There will be tonnes of predictions - many even overlapping, and then you will need to do some post processing to filter through all of that to figure out what is worthy of actually drawing. This article has a nice write up: