Error when launch Training multigpu

I get this error when i run training with mulitiGPUs:
nvalidArgumentError: 2 root error(s) found.
** (0) Invalid argument: Cannot batch tensors with different shapes in component 0. First element had shape [60,400,400,3] and element 7 had shape [32,400,400,3].**

Any help would be appreciated.
Thanks.

Hi @Awatef_Edhib. Welcome to the forum.
Did you manage to get your code working on a CPU?
I mean the error message doesn’t suggest this relates to GPU/multi GPUs. It looks like a “regular” op issue due to different batch size. I may be wrong though.

Hi @Awatef_Edhib, The error occurs due to combing of 2 different batch size tensors. You can try by unbatching them. Also as @tagoma suggested you can run your code on single cpu to know if it is a problem occurs using multi GPU or problem in the code implementation. Thank You.

1 Like

Hello @tagoma @Kiran_Sai_Ramineni when i run it in single gpu and with batch size = 1, it work without any error. But when i run it on multigpus or i change batch size, i get an error.

1 Like

Hey @Awatef_Edhib!

@Ekaterina_Dranitsyna posted it yesterday

E.g. the section Creating a data parallel mesh

By chance, could this be of help to you?