How big is too big for a tensor?

So I’m totally new to this and right now I’m basically just messing around with models I found online using some data I’ve created myself. I had a model that worked pretty well and was giving me relatively decent results and the tensor shapes were around 576x90. I’ve now made the data bigger so it’s around 3000x90 and I’m getting out of memory errors.

My original tensors were running in batches of about 256, now my new ones can’t even run in a batch of 1. I’m now running on a more powerful machine with 45GB of RAM and 80GB of VRAM and it’s still running out of memory. I would have thought that much bigger tensors than this are being used.

You might find this section useful

Thanks, I’m going to look into DeepSpeed Zer0.