I’m trying to implement the masking generation function for BEiT:

```
"""
Originally inspired by impl at https://github.com/zhunzhong07/Random-Erasing, Apache 2.0
Copyright Zhun Zhong & Liang Zheng
Hacked together by / Copyright 2020 Ross Wightman
Modified by Hangbo Bao, for generating the masked position for visual image transformer
"""
# --------------------------------------------------------
# BEIT: BERT Pre-Training of Image Transformers (https://arxiv.org/abs/2106.08254)
# Github source: https://github.com/microsoft/unilm/tree/master/beit
# Copyright (c) 2021 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# By Hangbo Bao
# Based on timm, DINO and DeiT code bases
# https://github.com/rwightman/pytorch-image-models/tree/master/timm
# Originally inspired by impl at https://github.com/zhunzhong07/Random-Erasing, Apache 2.0
# Copyright Zhun Zhong & Liang Zheng
#
# Hacked together by / Copyright 2020 Ross Wightman
```

This file has been truncated. show original

The part I am struggling with is the assignment of EagerTensors.

I have consulted references that show how to approach such assignments, but this one does not seem to fit them.

Any particular approaches I should try out or look into for this case?

Bhack
March 14, 2022, 11:26am
#3
Is every single masking patch random inside the single image there?

A single mask can be applied to a batch too.

Thanks for sharing this. Will take a look.

@Bhack there’s actually no masking involved in the link you sent.

So, the question is pretty much still open.

Bhack
March 15, 2022, 10:20am
#9
Yes as they are just reloading Microsoft weights. So no train protocol there.

What Is your specific issue? Isn’t just the standard
image tokenization in many visual transformer where some token are masked?

What Is your specific issue? Isn’t just the standard
image tokenization in many visual transformer where some token are masked?

My issue is in the block-wise masking strategy where apparently tensor assignment is needed (refer to my initial post). Had it been randomized, it would have been easier and we implemented that a while back (here ).

Bhack
March 15, 2022, 12:21pm
#11
To exactly mimic that impl are you looking for slice assigment?

Yes. Please take note of this part before sharing existing references:

I have consulted references that show how to approach such assignments, but this one does not seem to fit them.

If there’s no way other than doing something like this , then it’s a different choice.

Bhack
March 15, 2022, 1:05pm
#13
Oh, in that case historically we are full of slice assignment tickets. Just to mention a few still open:

opened 09:28AM - 19 Jun 20 UTC

stat:awaiting tensorflower
type:feature
comp:ops

**System information**
- TensorFlow version (you are using): 2.2
- Are you wil… ling to contribute it (Yes/No): Yes
**Describe the feature and the current behavior/state.**
I would like to have slice assignment for Tensor objects in TensorFlow.
The code I would like to write is:
```python
import tensorflow as tf
a = tf.constant([1, 2, 4, 5, 7, 3, 2, 6,])
indices = tf.constant([3, 4], dtype=tf.int32)
a[indices] += 1
```
Of course it's a simplistic example and doesn't cover everything I want to do (I would use it in more complex functions not with constants), and I am happy to make it more complex if necessary.
Currently this code gives the error:
```
TypeError: Only integers, slices (`:`), ellipsis (`...`), tf.newaxis (`None`) and scalar tf.int32/tf.int64 tensors are valid indices, got <tf.Tensor: shape=(2,), dtype=int32, numpy=array([3, 4], dtype=int32)>
```
**Will this change the current api? How?**
I guess this is a change of API since it introduces a new functionality.
**Who will benefit with this feature?**
A lot of people have been asking for this feature for example in this GitHub issues:
- https://github.com/tensorflow/tensorflow/issues/14132#issuecomment-483002522
- https://github.com/tensorflow/tensorflow/issues/33131
These issues have unfortunately been closed because some workarounds for specific use-cases have been found (ones where the slicing is fixed and you can use [masking](https://github.com/tensorflow/tensorflow/issues/14132#issuecomment-483002522) or [TensorArrays](https://github.com/tensorflow/tensorflow/issues/14132#issuecomment-487643287)).
Some other issues deal with `Variable`s which is not what I am talking about here. [Some workarounds do exist](https://stackoverflow.com/a/62202181/4332585) involving `Variable` but they seem hacky.
I will personally benefit from it, in the multiple places where I now use `tensor_scatter_nd_add` or `tensor_scatter_nd_update`, which is solution that always works but is very difficult to write and very slow:
- [for a wavelet-based neural network, called MWCNN](https://github.com/zaccharieramzi/tf-mwcnn/blob/master/mwcnn.py#L106-L110);
- [for non-uniform fast fourier transform](https://github.com/zaccharieramzi/tfkbnufft/blob/master/tfkbnufft/nufft/interp_functions.py#L151);
- [for sensitivity map extraction when doing MRI reconstruction with TensorFlow neural networks](https://github.com/zaccharieramzi/fastmri-reproducible-benchmark/blob/master/fastmri_recon/data/utils/multicoil/smap_extract.py#L27-L35).
**Any Other info.**
The `tensor_scatter_nd_*` alternative might seem like a viable solution, but it suffers from 2 drawbacks that I consider huge:
- It is very difficult to write. It is actually so difficult, I decided to make a package that would alleviate this difficulty by having the different slicing possibilities unit tested: [tf-slice-assign](https://github.com/zaccharieramzi/tf-slice-assign).
- It is very slow. I made a [benchmark notebook](https://colab.research.google.com/drive/1gEjha7h1mhQkFwULS9MAU0bWQfzfEALY?usp=sharing) vs `pytorch` for slice assignment add. You can see that on GPU, using `tensor_scatter_nd_add` is 10 times slower than slice assignment in `pytorch` and 20 times slower on CPU. For a practical example, it means that my `tfkbnufft` (for non-uniform fast fourier transform) package is 30 times slower than its [torch counterpart](https://github.com/mmuckley/torchkbnufft#computation-speed) which I translated. This currently removes the possibility of training neural networks using the non-uniform fourier transform in TensorFlow.

opened 03:54AM - 08 Oct 19 UTC

type:bug
comp:ops
TF 2.7

as in numpy or pytorch ,we can do someting like this, but how to do it with tf2.… 0.
the following code will raise exception as :
`'tensorflow.python.framework.ops.EagerTensor' object does not support item assignment`
prediction[:,:,0]=tf.math.sigmoid(prediction[:,:,0])

opened 04:19PM - 18 Mar 19 UTC

closed 11:59AM - 24 Mar 22 UTC

stat:contributions welcome
stat:awaiting response
type:feature
stalled
comp:ops

I was wondering if it is possible to implement Numpy like slicing annd updating … a[1:10,2:20....] in Tensorflow. It would make life much easier. Right now it code just gets bigger and uglier and bug prone.

I’ve not checked the paper in details on what kind of index is going to be selected to execute the masking. Cannot be covered by `tf.tensor_scatter_nd_update`

after populating these indexes?

The indexing conditions are in the source code I provided.

If you know a way around with scatter, do you mind providing a minimal working code.

Bhack
March 15, 2022, 1:27pm
#15
E.g. I think that embedding in the Hugginface transformers library, also if it is using Pytorch ops, is not going to require/use the slice assignment:

```
```
# Based on timm implementation, which can be found here:
# https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vision_transformer.py
class BeitEmbeddings(nn.Module):
"""
Construct the CLS token, position and patch embeddings. Optionally, also the mask token.
"""
def __init__(self, config: BeitConfig) -> None:
super().__init__()
self.cls_token = nn.Parameter(torch.zeros(1, 1, config.hidden_size))
if config.use_mask_token:
self.mask_token = nn.Parameter(torch.zeros(1, 1, config.hidden_size))
else:
self.mask_token = None
self.patch_embeddings = PatchEmbeddings(
image_size=config.image_size,
patch_size=config.patch_size,
num_channels=config.num_channels,

This file has been truncated. show original

I think you’re mistaken then.

`bool_masked_pos`

in the `forward()`

is nothing but the output the mask yielded by the class I showed in my initial post.

Bhack
March 15, 2022, 9:09pm
#17

Bhack:

`tensor_scatter_nd_update`

It is true `bool_masked_pos`

is only the “application” of the masking but then ownership to prepare the mask it is still to the external the caller.

I don’t see all the details that are in reference implementation in the paper but with the concrete reference implementation you shared, with all these `attemps`

, `conditional loops`

etc, you could try to use a `tf.variable`

to mimic that implementation but probably you will need to refactor it more in graph mode/`tf.function`

:

Absolutely. And in case no reference implementations are available I guess the implementation done by the actual author comes to the rescue.

There isn’t much about it in the paper apart from the figure on block-wise masking which is why the original implementation is an important reference point.

Thanks for sharing your implementation. Will check it out.

Bhack
March 17, 2022, 1:58pm
#19
Having a `tf.function/graph`

version It is quite trivial with few changes/substitution with TF ops.

But a `jit_compile=True`

version it will require a new design and probably some compromises.

Let me know if you have a `jit_comile=True`

version.

What’s trivial for you may not be trivial to someone else

Bhack
March 17, 2022, 2:25pm
#21
Let me know when you have the same I’ve posted but with TF instead of numpy ops.
I will help you to make the required changes for `tf.function`

.

Bhack
March 17, 2022, 10:23pm
#22
This is already working with `tf.function`

with minimal changes

```
@tf.function
def _mask(self, mask, max_mask_patches):
delta = 0
for attempt in tf.range(10):
target_area = random.uniform(self.min_num_patches, max_mask_patches)
aspect_ratio = math.exp(random.uniform(*self.log_aspect_ratio))
h = int(round(math.sqrt(target_area * aspect_ratio)))
w = int(round(math.sqrt(target_area / aspect_ratio)))
if w < self.width and h < self.height:
top = random.randint(0, int(self.height - h))
left = random.randint(0, int(self.width - w))
num_masked = tf.math.count_nonzero(mask[top: top + h, left: left + w])
# Overlap
if 0 < h * w - num_masked and h * w - num_masked <= max_mask_patches:
for i in range(top, top + h):
for j in range(left, left + w):
if mask[i, j] == 0:
mask[i, j].assign(1)
delta += 1
if delta > 0:
break
return delta
```