Tensorflow uses keras to define the training model, and multiple GPUs can accelerate normally. However, when using a custom loop training model, the batch_size(the memory will overflow if the multi-gpu setting is too large) setting is the same as that of a single gpu, and the model training speed is slower than that of a single gpu. Could not find a solution, anyone can help, thanks.
Welcome to the Tensorflow Forum!
Please share a minimum reproducible code to reproduce your problem.
Please refer to the distributed training guide and tutorials on tensorflow.org
They explain how to split your batch (or model using dtensor) across multiple devices.