Furthest Neighbors Instead of Nearest Neighbors in TFRS

rcauvin · June 24, 2023, 11:17pm

In a recommendation system, we typically use an instance of tfrs.layers.factorized_top_k.ScaNN or tfrs.layers.factorized_top_k.BruteForce to retrieve the nearest neighbors of a query vector. Is there a convenient way to retrieve the furthest neighbors instead?

rcauvin · June 25, 2023, 9:01pm

I looked at the code for tfrs.layers.factorized_top_k.BruteForce and realized I could create a nearly identical class (BruteForceFurthest) that returns the furthest neighbors by changing one line of code. Here is the relevant code excerpt from the call() function.

    scores = self._compute_score(queries, self._candidates)
    values, indices = tf.math.top_k(-scores, k = k)  # Negate the scores to get the furthest neighbors
    return values, tf.gather(self._identifiers, indices)

As you can see, I think it is as simple as negating the scores so that the “top K” scores are really the bottom K scores.

bmaso · July 3, 2023, 4:05am

How about inverting the distance functio ? Any kind of transformation on the distance calculation that would reverse its meaning would do – (x * -1), (x ^ -1), etc.

It’s kind of interesting also to consider when the “distance” is computed with multiple dimensions, how a euclidean distance or RMS would be the typical distance metrics we humans would gravitate towards. But any arbitrary Minkowski distance (Minkowski distance - Wikipedia) could be used. Switching up the p-value can produce very different N-body graphs from the same original coordinates.

rcauvin · July 4, 2023, 4:33pm

I figured I could simply use the same distance and score calculation that tfrs.layers.factorized_top_k.BruteForce uses by default but negate the scores, as shown in the code excerpt I provided.

In any event, I ended up implementing another class that combines the “positive” and “negative” retrieval models and averages the respective “nearest neighbors” and “furthest neighbors” scores.