How to prepare the dataset for item-to-item recommendation?

Wenhui_Jolie_Zhang · May 23, 2023, 9:00am

Hi, the item-to-item recommendation is mentioned in this tutorial, but I’m a bit confused of how to prepare the dataset for both query and candidate tower.

For example, i want to recommend a list of related movies for each movie. And the dataset is from the user’s view history. Is it correct to prepared a dataset like this?

query_item (the movie_id which a user watched)	candidate_item (the movie_id which a user also watched)
101	301
101	301
101	401
201	301
301	101

and if user_A watched movie_101, and also watched movie_301, user_B watched movie_101 and also watched movie_301, they have the same view history. then in the prepared dataset, should i keep the two records? or just the unique items_pair is ok?

Thanks in advance for your time !

Laxma_Reddy_Patlolla · May 23, 2023, 9:54pm

Hi @Wenhui_Jolie_Zhang ,

If user_A watched movie_101, and also watched movie_301, and user_B watched movie_101 and also watched movie_301, they have the same view history. In this case, you should keep the two records in the prepared dataset. This is because the two records provide more information about the relationship between movie_101 and movie_301.

If you only keep the unique item pair, you will lose information about the strength of the relationship between the two items. For example, if user_A watched movie_101 10 times and user_B watched movie_101 1 time, then the relationship between movie_101 and movie_301 is stronger in the case of user_A. However, if you only keep the unique item pair, you will not be able to distinguish between these two cases.

Therefore, it is important to keep all of the records in the prepared dataset, even if they have the same query and candidate items. This will ensure that you have the most accurate information about the relationships between the items.

I hope this helps!

Thanks.