Performance Analysis of the TPU Memory Architecture

I am trying to perform a performance analysis of basic operations in TPU and try to do a benchmarking in the different memory hierarchies. I am trying to use the code below in Cloud TPUs.

I am wondering that is there any memory type classification in TPU as in GPUs like local memory, global memory, texture memory, or register memory.

If there is what kind of HLO representation do I need to use?
#help_request #help_research #tpu #xla