Model parallelism in Keras does not seem possible as layers cannot be assigned to devices. Working with large data grids and large complex models eg using the Keras functional API, means that we run out of memory on GPUs very quickly so an approach to model parallelism is essential particularly for the application of AI to science problems. It would be very helpful to understand what is happening here?
If you are interested here is paper that talks about these issues related to weather forecasting:-
I first opened this as a bug under TensorFlow but have not had a response…