What are the limits of client side loading with tfjs?

I have a web app client on a nuxt/vue/vuetify/firestore/gcp stack. We are porting our traditional deep net NLP predictions from offline python/tf/keras to tfjs in the webclient.

I know we will retrain the models in tfjs on a back-end node.js server. The question is where to load the models for prediction. I see my options as:

  1. a) serve the models from some where to load as url
    b) Then use node.js load call to load 1 or more of 25 mdls
    c) make a client side 1 to many predictions
    Q?) Is this too much load on client? How much can it handle (in general terms)?

  2. a) build a simple node.js backend.
    b) house the mdls locally in this stack
    c) use REST API to load model on back end server
    d) make one to many predictions
    Q?) seems more feasible, but how to manage the listene, uptime, etc and support long-running calls?

If anyone has some practical experience, suggestions, or benchmarks that will help us find a path, I really appreciate it. :slight_smile: