Performance Analysis of the TPU Memory Architecture

fyaman · March 27, 2023, 12:43am

I am trying to perform a performance analysis of basic operations in TPU and try to do a benchmarking in the different memory hierarchies. I am trying to use the code below in Cloud TPUs.

github.com

tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/python/tpu_driver/client/libtpu_client.c

/* Copyright 2019 The TensorFlow Authors. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/

// Before you start, make sure libtpu.so, libtpu.h and libtpu_client.c are in
// the same working directory.
//
// To compile: gcc -o libtpu_client libtpu_client.c -ldl
// To run: sudo ./libtpu_client

This file has been truncated. show original

I am wondering that is there any memory type classification in TPU as in GPUs like local memory, global memory, texture memory, or register memory.

If there is what kind of HLO representation do I need to use?
#help_request #help_research #tpu #xla