Record linking in two tables

I’m interested in learning tensor flow. I have a real world problem where I have two tables of data, the data is made up of columns of types strings, numbers and dates.

Each row in table 1 has an equivalent entry in table 2. The data in table 2 will be similar but not exactly equal to it’s equivalent in table 1. Is it possible to use tensor flow to identify which records are related to each other?

1 Like

That’s a great question!

I think it depends how you define that “relation”. You may be familiar with some of these methods, but please check them out if you’re not! You could do something like an encoder decoder network, and then compare the encodings with a distance similarity (like cosine distance). But you could also try that with a method like PCA.

Encoding and PCA, in a simplified sense, take your records and convert them to a few numbers, so you can then just compare the numbers with some distance formula. There are some great tutorials on using Keras for encoders, so please check them out!

Scikit-learn has some good documentation on PCA (principal component analysis), and the tensorflow and keras site have some good code examples, including some on encoders (usually with images, but it’s the same concept).

I hope this helps!

1 Like

Great thanks, I’m not familiar with the terms, so you’ve definitely given me lots to be reading up on!

1 Like