I found HierarchicalCopyAllReduce is much slower than NcclAllReduce, related https://github.com/google/automl/issues/971. Any ideas?
1 Like
I found HierarchicalCopyAllReduce is much slower than NcclAllReduce, related https://github.com/google/automl/issues/971. Any ideas?