When I use numpy to process the tensor array generated by tensorflow, does it generate a new numpy array in memory or directly use the passed in tensorflow

jun_yin · June 28, 2022, 10:32am

During the process of using tensorflow, I found that it seems that the tensor array can be directly processed by numpy. What will be the impact of doing so? In memory, whether to automatically convert the tensor array to numpy array before operation or directly process the tensor array

Atia · June 28, 2022, 12:48pm

This is a block from the official tensorflow website about numpy compatibility with tensors;

Tensors are explicitly converted to NumPy ndarrays using their .numpy() method. These conversions are typically cheap since the array and tf.Tensor share the underlying memory representation, if possible.

What this means is that changing one directly affects the other without any need for copying data (given they are running on cpu since numpy does not have any accelerator backing). However should the tensor be hosted on gpu/tpu memory, the conversion to numpy will include copying to cpu.

Example attached;
Screenshot from 2022-06-28 12-44-04

Screenshot from 2022-06-28 12-45-13

jun_yin · June 29, 2022, 12:54am

If I need to operate on the GPU with high frequency, is there any good optimization method? For example, does the flip operation in numpy have a corresponding API in tensorflow? Thank you for your reply

jun_yin · June 29, 2022, 1:03am

I just made an attempt. It seems that in the GPU, their memory address is still in the hook state, rather than copying a new numpy array?

1656464568781

Atia · June 29, 2022, 9:45am

Here is a link from the api docs that shows how to use tensorflow equivalent of a flip. Tensorflow under the hood tries to optimize your workflow when it can, however you as the engineer/programmer can also adopt some good practices for better optimization. For starters, you can adopt good input pipeline structure ie making sure the flow from data extraction to data transformation to model training appears “seamless” enough. You could also fuse together multiple smaller kernels to make one large kernel (this can be achieved with tf.function). Also you could enable mixed precision and xla. This can help

Atia · June 29, 2022, 10:01am

Try printing out

t.device
n.device

This may help to know which device the variables are on.
PS: n.device will give you an attribute error. I guess this shows that it truly is a numpy array and hence lives on the cpu instead.