Deploy Subclassed Model in TensorFlow

Hi all!
I’ve followed this tutorial given on TensorFlow’s website about image captioning, and I’ve extended the model to Transformers. Now I want to deploy the model on a web application, but I’m finding difficulties in that. Can anyone guide me how to do this? It’ll be a great help. Thanks.

1 Like

Do you want to run the model inside the browser (TF.js) or with a client/server approach?

1 Like

adding @Jason to help

For conversion to work on client side in browseryou can check out this codelab:

If you want to run server side and just expose answer via the web but do the inference server side you can use TensorFlow.js in Node.js and you can directly take a SavedModel WITHOUT conversion and use with TensorFlow.js Node. This will make your life much easier to integrate with traditional web stack as you can use websockets and Express very easily - not to mention your pre/post processing will benefit from the JIT Compiler of JS which means you can get potentially around a 2x speed improvement over Python end to end inference times (including the pre/post processing) depening on how much pre/post processing you have, which may be useful in your use case. See this case study:

1 Like

And we have TF serving:

Yes, I want to make it a client-server approach. I want to make an image captioning application based on UI.

Question: Does TF Serving work with TensorFlow.js flows? @Robert_Crowe ?

In that case if you want to use TensorFlow.js you can use our Node.js implementation on server side and use any Python saved model without conversion in your regular Node.js flow! :slight_smile:

You can have rest API that you can consume with JS:

Or do you meant something different?

So I was aiming for more of a Node.js pipeline integration - non Python stack - does TFX support Node deployments?

Also many Node folk use web sockets so support for WebSockets for bidirectional communication would be useful to JS developers too so you could stream sensor data from client close to real time and get a stream of classifications back also close to real time using something like or similar which could be very interesting. Does that exist on TFX?

For others reading new to web sockets these folk have a good primer on Websockets vs REST:

1 Like

TFX is really about:

  • Training models, including TFJS and TF Lite
  • Running batch inference

TFX doesn’t serve models or results. TF Serving serves inference realtime, over either gRPC or REST. AFAIK there is no Node.JS or WebSockets integration, but that would be cool!

1 Like

Thanks Robert! Do you have any links to TFX TFJS examples for training etc? That would be useful as the TFJS community is growing fast and I am pretty sure more folk may be interested in an example there to define a pipeline for training etc.

I think that It could confuse some user that also if TF serving is in its own GitHub respository we have some official doc on the website under the TFX hierarchy:

1 Like

Any idea how to handle text pre/post processing in case of custom models?
With input signatures, we have to define the input as well as the output shapes and dtypes for a tensor to be processed by a function/method. But how do I preprocess the text from string to indices so that it could be fed into the model?
any help would be highly appreciated : )

So I wrote a codelab for comment spam detection here that may touch on some related areas - tokenization etc that may be of use for you here:

Sir, the tutorial that you shared is great, but I am looking for something different. Since my oringal dataset is huge, I am carrying the preprocessing step using pyspark, where I assign indices to different items/tokens, and then I make padded sequences and write them in a csv to be further consumed by my model.

I am building an item2idx and idx2item dictionary before training the model.
So what exactly I am looking here for is a way to use of those dictionaries while inference after exporting the model (along with the dictionaries)

To make your life easier in JS you may want to export to JSON as you can consume JSON very easily in JS as it is literally JavaScript Object Notation. You could then just load in the JSON file and use it right away in your program.

The question then becomes, how big are those files? You can certainly download large files to the browser, but if on mobile phone on bad wireless connection it could take long time. If over say 10Mb in size then maybe you want to store serverside and just use websockets to talk to a Node.js server to which you can send a sentence and get back an encoding and have the dictionary stored in memory so super fast response time. Something like that.

Many ways to do it depending on your needs - if you dont need a fast loading time then sending the JSON to client is completely fine too.


Here’s an example of using TFX with TFJS: