Training with multi-gpus can not accelerate

Tensorflow uses keras to define the training model, and multiple GPUs can accelerate normally. However, when using a custom loop training model, the batch_size(the memory will overflow if the multi-gpu setting is too large) setting is the same as that of a single gpu, and the model training speed is slower than that of a single gpu. Could not find a solution, anyone can help, thanks.


Please refer to the distributed training guide and tutorials

They explain how to split your batch (or model using dtensor) across multiple devices.