Learning to Resize in Computer Vision

It is a common belief that if we constrain vision models to perceive things as humans do, their performance can be improved. For example, in this work, Geirhos et al. showed that the vision models pre-trained on the ImageNet-1k dataset are biased toward texture whereas human beings mostly use the shape descriptor to develop a common perception. But does this belief always apply especially when it comes to improving the performance of vision models?

Know more in this post:

3 Likes

Great share, thanks Sayak!

3 Likes

Interesting!

Thereā€™s something funny with the visualizations they shouldnā€™t be that saturated. convert_image_dtype isnā€™t working because your input is already a float.

Fixed: Copy_of_learnable_resizer.ipynb - Google Drive

Also:

  • Itā€™s clearer if you show the before and after for each image.
  • Itā€™s easier to follow if you create get_resizer function that returns a concrete resizer model, then just use that.
3 Likes

Thanks, Mark! I can make the changes and create a PR.

Thereā€™s something funny with the visualizations they shouldnā€™t be that saturated. convert_image_dtype isnā€™t working because your input is already a float.

Thanks for catching it. I didnā€™t realize even if my inputs are in float, I couldnā€™t use convert_image_dtype() to scale the pixels to [0. 1]. Maybe casting the dtype to int after the resizing step (since tf.image.resize() casts to float) would be easier.

2 Likes

@markdaoust FYI:

1 Like