Custom sampler inside a `tf.data` pipeline

Sayak_Paul · April 23, 2021, 5:03am

Hi folks.

I have a use case on binary segmentation i.e. the per-pixel categories can only be either of the two given classes. The presence of these classes inside the training images is skewed. This essentially relates to a class imbalance problem but in a 3D space which is a bit complicated to handle.

So, instead of setting the sample_weight (which is recommended to deal with this problem), I did some research and found the following to be a pretty elegant way of dealing with the problem. When feeding a batch of samples to the model, always ensure the number of images containing the positive class is beyond a prefixed ratio.

The ground-truth segmentation masks contain 0’s and/or 1’s. One way to ensure that a mask has some presence of the positive class is to compare its mean. For masks containing no positive class pixels, will have a mean of 0.

I am looking for snippets/pointers/approaches on how to realize this inside a tf.data pipeline.

This is a tried and tested method (see here and here).