Learning to Resize in Computer Vision

Sayak_Paul · May 12, 2021, 1:32am

It is a common belief that if we constrain vision models to perceive things as humans do, their performance can be improved. For example, in this work, Geirhos et al. showed that the vision models pre-trained on the ImageNet-1k dataset are biased toward texture whereas human beings mostly use the shape descriptor to develop a common perception. But does this belief always apply especially when it comes to improving the performance of vision models?

Know more in this post:

Laurence_Moroney · May 12, 2021, 3:27pm

Great share, thanks Sayak!

markdaoust · May 13, 2021, 12:14am

Interesting!

There’s something funny with the visualizations they shouldn’t be that saturated. convert_image_dtype isn’t working because your input is already a float.

Fixed: Copy_of_learnable_resizer.ipynb - Google Drive

Also:

It’s clearer if you show the before and after for each image.
It’s easier to follow if you create get_resizer function that returns a concrete resizer model, then just use that.

Sayak_Paul · May 13, 2021, 1:18am

Thanks, Mark! I can make the changes and create a PR.

There’s something funny with the visualizations they shouldn’t be that saturated. convert_image_dtype isn’t working because your input is already a float.

Thanks for catching it. I didn’t realize even if my inputs are in float, I couldn’t use convert_image_dtype() to scale the pixels to [0. 1]. Maybe casting the dtype to int after the resizing step (since tf.image.resize() casts to float) would be easier.

Sayak_Paul · May 14, 2021, 2:00am

@markdaoust FYI: