Video classification with MoViNet in web browser?

TsuNa · May 18, 2023, 9:17am

Hey there !
I’m here to get some advice about a video classification problem, but with a constraint : I’ve to do this in a web browser. Here is the context :
I’ve to create an app to identify a composed move (with 3 distinct sub moves). So I thought to train a CNN to recognize these 3 sub moves (with MoviNet), then to decide if this is the expected composed move. However, I’ve an imposed constraint : The app should works in a web browser, so i’ts excluding using Python. Moreover it should run only on client side. Server is here only to answer the request for html/css/js (dunno why, it’s an imposed constraint.) So I’m limited to JS and TF.js.
Unfortunately, when I click on the page I linked earlier to download the starter video classification model, it seems I can only download models for TF or TFLite, which are using Python, nothing for TF.js.
Is there a way I can use a MoViNet model in a web browser environment ? May I can train the MoViNet model for my task in a python environment then convert it using tfjs-converter ? Or is this precise model to complex for a web browser ? May even the task requires too much ressources for a web browser ?

Thanks for help

Jason · May 26, 2023, 4:59pm

You may want to use the output of a model like MoveNet:

To then record these points over many frames of video, and then use those inputs as your training data - eg maybe you have a couple of seconds of data (or whatever time frame is needed to perform the moves), to which you use some classification network on that data depending on how you encode it such as a multi layer perceptron, LSTM, or even a CNN I have seen people use if you represent the captured data as an image: