Perform TimeDistributed step by step to save VRAM

We are applying TensorFlow/Keras to analyse data obtained in an astroparticle physics experiment that consists of a grid of detectors. The input of our network has a shape of (n, n, 360, embed_dim), where n*n is the size of our grid.
In the first step, we are using two TimeDistributed layers to individually analyse the signals on the detector level, followed by a convolution over all detectors to combine the results.
Increasing n, we found that when we use transformers, the necessary VRAM exceeds the 24 GB that our GPU cluster can provide, as TensorFlow seems to perform all of the n*n individual detector-analyses at once, resulting in huge tensors for large values of n.
As the operations are independent of each other, is it possible to force TensorFlow to calculate the operations step by step instead?

What TF version are you using?

We are using version 2.4.1

Can you test on the last TF version?

Thank you for your answer. The behavior is the same using TensorFlow version 2.7.0. We assumed the behavior is intended, to speed up the calculations.