How to do atomic add on CPU with multithreading

I’m writing a TF c++ kernel with gradient estimation.
It requires to estimate gradients with respect to multiple input variables of different sizes.

Consequently, i need to update some gradient values multiple times (add gradient value).
For GPU kernel i use GpuAtomicAdd function, but how can i do the same on CPU?

It seems that there is no such function in TF codebase and i don’t want to use large arrays of mutexes due to overhead and code inconsistency