I’m using the efficientnet model and the 1280 vector output for image similarity. It’s working great and from testing is the best model I’ve found with the amount of data that is used.
Sometimes it just does weird things though.
Here are the image in question: Imgur: The magic of the Internet
The first image is the input, the 2nd which should be found and the third one which is actually found as closest.
I’m using a compare script and these are the results of said images:
input against image that should be found (img1 vs img2)
No I’m using pre-trained weights. I would train it but I’m not sure what’s the correct course and the benefit. The image database is only filled with one class, namely stamps, so any classification goes out the window.
Thank you very much for your suggestions!
I’ve stumbled upon metric learning before but it went over my head how to implement it in my case. I’ve figured it out now, wasn’t too difficult once it clicked with the link you provided and the tests I’ve done show good results. The distance is much much closer now. Sometimes nearly 0 which is a big win!
I’m still struggling to understand what’s really going on and how this works. From my understanding every layer has weights that can be tuned, in fine-tuning you freeze most of the pre-trained weights. I’m not freezing any right now and I wonder if that’s the best thing to do.
From my intuition I’d freeze every weight but the last one I use, the avg_pool but this one and those before don’t have much weights. I fear I skew the weights too much with my limited data set.
Any suggestions on this or do you think it’s alright?
The goal is to take a camera shot of a real stamp and find a correct lookup in a database that consists of 350k+ unique stamp images.
Most of the results are pretty accurate and good, it could be better though, that’s why I’m looking to further train the model.
For a lot of stamps I have real camera photos, 10-20 or more which I could compile but:
With the metric learning I’m running into the problem that a label is expected and as I technically have over 350k labels I don’t know how to deal with that. In practice, I think metric learning is the right solution but I’m a little stuck right now.
How should I handle 350k+ classifications? Overall I would need at least 1 million because that’s roughly the count of unique stamps that exist.
If I try with 1 million labels I run into OOM errors. This conceptual problem keeps me from doing any meaningful training.
I see that you are using ImageNet Pre-Trained weights of EfficientNet for Feature Extraction.
From what I see, either or both of the following issues may be contributing to the weird result:
The Keras official implementation of EfficinetNet Expects Un-Normalized Inputs in the range of 0-255. So, in case you are normalizing the input images before feeding them into the network, it may lead to issues.
Quote from Documentation:
EfficientNet models expect their inputs to be float tensors of pixels with values in the [0-255] range.
Source: Keras EfficientNet Documentation
The alternative issue (and most likely one) is since the network was pre-trained on ImageNet, which does not contain examples similar to your query and target image, it is possible that the feature map for both these images is the same and/or similar, which is leading to the error in distance calculations. A solution, in this case, would be to train/fine-tune your model on your relevant dataset to get a more relevant feature vector.