Distributed training with XLA

I have noticed that XLA can now compile MirroredStrategy, so we can use jit compiler on multi gpu and get a better performance. But I’d like to know how xla optimize distributed training. Or XLA just optimize the process running independently on each GPU, and doesn’t optimize any communication or synchronization.