Parallelising model with multiple inputs

I have a model with 5 inputs, and 5 outputs. Each output has its own loss function, but Keras is minimising the sum of the individual losses (this is the default behaviour I think).

What’s the best way to parallelise training here? By default Keras will train each part of the model sequentially I think. But I’m interested in the best way to train the various parts of the model: a) across multiple processes, on a single GPU, and b) across multiple GPUs.