It is a common belief that if we constrain vision models to perceive things as humans do, their performance can be improved. For example, in this work, Geirhos et al. showed that the vision models pre-trained on the ImageNet-1k dataset are biased toward texture whereas human beings mostly use the shape descriptor to develop a common perception. But does this belief always apply especially when it comes to improving the performance of vision models?
Know more in this post:
3 Likes
Great share, thanks Sayak!
3 Likes
Interesting!
Thereās something funny with the visualizations they shouldnāt be that saturated. convert_image_dtype
isnāt working because your input is already a float.
Fixed: Copy_of_learnable_resizer.ipynb - Google Drive
Also:
- Itās clearer if you show the before and after for each image.
- Itās easier to follow if you create get_resizer function that returns a concrete resizer model, then just use that.
3 Likes
Thanks, Mark! I can make the changes and create a PR.
Thereās something funny with the visualizations they shouldnāt be that saturated. convert_image_dtype
isnāt working because your input is already a float.
Thanks for catching it. I didnāt realize even if my inputs are in float, I couldnāt use convert_image_dtype()
to scale the pixels to [0. 1]. Maybe casting the dtype
to int after the resizing step (since tf.image.resize()
casts to float) would be easier.
2 Likes