I have a mnist neural code that I run using tensorflow-gpu v2, and Keras. I have alternately run it on a cpu, a single GPU, and on a board with two GPUs. Each show improvement but not as much as I hoped. I changed the two gpu code to take advantage of tensorflow distributed mirrored strategy. Using nvidia-smi, I can see that in the later case the two GPUs share the load. My question is three fold. 1) when running on the CPU, do I have to do anything different to take advantage of the multiple cores on the CPU and how do I know it is utilizing all the cores. 2) If I am running on the single GPU, do I have to do anything different in the tensorflow code to make sure it is taking advantage of all the GPU cuda cores and how do I know if it is utilizing all of these cores?

There are a number of examples on tensorflow.org. Have you tried them? Single-worker/Multi-worker with Mirrored Strategies.