I was following TensorFlow’s Recommending movies: retrieval tutorial, aiming to adapt it to my problem: It is a similar recommender system, with the difference that previously seens samples must be excluded at all times. The only section where I read something slightly related on that issue (and in general, on TFRS documentation), was the following:
Test set performance is much worse than training performance. This is due to two factors:
- Our model is likely to perform better on the data that it has seen, simply because it can memorize it. This overfitting phenomenon is especially strong when models have many parameters. It can be mediated by model regularization and use of user and movie features that help the model generalize better to unseen data.
- The model is re-recommending some of users’ already watched movies. These known-positive watches can crowd out test movies out of top K recommendations.
The second phenomenon can be tackled by excluding previously seen movies from test recommendations. This approach is relatively common in the recommender systems literature, but we don’t follow it in these tutorials. If not recommending past watches is important, we should expect appropriately specified models to learn this behaviour automatically from past user history and contextual information. Additionally, it is often appropriate to recommend the same item multiple times (say, an evergreen TV series or a regularly purchased item).
This is OK as a theoretical explanation, but now I have the following questions:
- Excluding previously seen samples is something currently feasible with TFRS, and if so, how to do it?
- What the article refers as Test set is actually that set, or a CV Set?
Would be nice if the explanation uses the code of the linked tutorial, as foundation for any explanation.
Thank you very much!
PS.: Tried to include a link to TensorFlow’s Recommending movies: retrieval tutorial, but the automated system mentioned that address could not be used. I appreciate if you can make a quick search, it will be one of the top results. Thanks