OOM while training LLM

Im working on trying to fine tune a 7 Billion parameter model in TF keras but I’m having the problem that I can’t distribute the model across multiple GPUs.

I read some about the parameter server strategy but I can’t really get it to work the way I want to.

If anyone has any insights they would be greatly appreciated.

Hi @Leonhard_Piff, Please refer to this document for running Gemma 7b on multiple GPU’s. Thank You.