Browser webGPU vs tfjs-node-gpu questions, assumptions, and performance

I have been using tfjs in the browser with webGPU backend to train my model and have had some success. However, I am trying to increase the size of my model which understandably has caused the training time to increase dramatically.

To address this I am attempting to move training over to node (specifically the tfjs-node-gpu backend) and my initial results have been confusing. It appears to much slower than what I was doing before in the browser.

I assumed the tfjs-node-gpu backend using CUDA would be faster than webGPU in the browser, is this a correct assumption?

It looks like webGPU can be used in node as well. Should this be faster than the tfjs-node-gpu backend?

Is it possible that tfjs-node-gpu is not properly using my GPU and just putting it all on my CPU? Using task manager it does not appear that the GPU is being stressed nearly as much as it was.

How do I verify that the tfjs-node-gpu backend is using my GPU?

Are there any setting I need to set ensure GPU usage is prioritized in node?

If I have newer versions of CUDA toolkit or cuDNN SDK installed than are recommended could that be causing performance issues that will not generate an error?

Your assumption is correct - CUDA on Node should be faster for training than front end JS as front end is optimized for inference.

AFAIK WebGPU is not available in TFJS Node as its not web browser environment and back when we were working on Node stuff I am fairly sure WebGPU was not even available for it - I would stick with CUDA for server side stuff.

The backend Node implementation is a wrapper around C++ (just like Python is a wrapper around C++ too) so there is no WebGPU / WebGL / browser tech being used as far as I am aware.

It sounds like maybe the CUDA is not setup right and it using CPU for some reason - I know setting up CUDA on server can be tricky.

@pyu Would have the definitive answer here though