Is there a proper way to perform distributed training on TF Probability with mirrored strategy?
When we sample the weights from the Reparameterization layers they will be a lot different between between each device and bc of that this will mess up the loss and gradients as my model replicas out of sync between the devices.
Is there a proper way to set the random seed across all GPUs and sync the models across all nodes?
Tks.