How to make tf.js CPU backend faster at computing gradient of small functions (<100 params)? A benchmark comparision with numerical differentiation in pure JS

Hi there,

For fun, I’ve been trying to reimplement G9.js interactive graphics using TF.js automatic differentiation instead of finite difference (which G9 uses).

I’ve managed to do it, but the use case is very different from machine learning, since I need the gradient of very small functions (typically below 10 parameters), but I need it very fast, so that the optimization can converge within a single browser frame (16ms).

G9 uses numerical differentiation for this, so I’ve written a quick benchmark comparing TF’s tf.grads() vs G9’s gradient() for a model function that’s representative of interactive graphics, with a configurable number of parameters.

Here are the results, averaged over 50 gradient computations, for a different number of parameters. You can see that TF’s CPU backend becomes faster than numerical differentiation around 100 parameters.

> 1 parameter:
g9: 0.000ms (min) | 0.002ms (avg) | 0.100ms (max)
tf: 0.200ms (min) | 0.426ms (avg) | 1.500ms (max)

> 10 parameters:
g9: 0.000ms (min) | 0.002ms (avg) | 0.100ms (max)
tf: 0.200ms (min) | 0.406ms (avg) | 1.500ms (max)

> 50 parameters:
g9: 0.000ms (min) | 0.052ms (avg) | 0.200ms (max)
tf: 0.100ms (min) | 0.356ms (avg) | 1.400ms (max)

> 100 parameters:
g9: 0.100ms (min) | 0.210ms (avg) | 0.500ms (max)
tf: 0.100ms (min) | 0.358ms (avg) | 1.300ms (max)

> 200 parameters:
g9: 0.500ms (min) | 0.674ms (avg) | 1.800ms (max)
tf: 0.100ms (min) | 0.330ms (avg) | 0.900ms (max)

> 500 parameters:
g9: 3.500ms (min) | 3.730ms (avg) | 7.600ms (max)
tf: 0.300ms (min) | 0.498ms (avg) | 1.000ms (max)

> 1000 parameters:
g9: 14.400ms (min) | 14.606ms (avg) | 20.100ms (max)
tf: 0.600ms (min) | 0.708ms (avg) | 0.900ms (max)

You can try the webgl backend with the same benchmark but obviously then the GPU to CPU transfer time dominates so it’s not super relevant.

My question is: How can I make TF.js CPU backend faster at computing the gradient of small functions (< 100 parameters)?

Why targeting JS CPU backend which is our slowest form of execution? How come not WASM instead which is CPU too but much faster?

Thanks for your answer! For this particular use case wasm backend is about the same speed as the cpu backend. At least in my little toy benchmark.

Interesting. I guess its due to the very low number of parameters then. I shall ask the team and see if any thoughts on this one, but short of optimizing our actual code in our repo for CPU (we are open source) I am unsure if any other way to make that go faster (though they should be pretty well optimized over the years - though we probably focused on WASM / WebGL as 99% of our users use those for AI tasks and very few will fall back to CPU - so its possible there is something we missed there).

Do you have example code for the TFJS usage we can use to replicate or a working Codepen or Glitch.com example? Could be useful for other folk reading this and wanting to help out too.