We use xlnet models
Data is continuous data, and the batch and the next batch are continuous.
Use ‘dataset.batch(2)’ and ‘tf.distribute.MirroredStrategy()’
gpu:0 → 1,3,5,7
gpu:1 → 2,4,6,8
I think the data will go in the same way as in the example above and lose continuity, is there a solution?
Hi @wonjun_choi, This is how distributed training works. If the batches are computed across different gpus instead of one gpu the compute time will be reduced. Even the batches are splited across different GPUs after computing each batch the results are merged. If you want all batches to be computed on one gpu continuously you can try without using a distributed strategy. Thank You.
Thank you for your answer!
In Tensorflow, I couldn’t find a way to distribute training by putting only the data I wanted in each GPU, so I used horovod to distribute it