How to binary classify mixed structured data?

Hi, I have the following sample data and need to create a neural network to predict if a record will match (MATCH column). I have added the reason for a match in the Description column. I have looked at the structured data and text classification tutorials, but I don’t know how to proceed here. Can you help me? Thanks

id date amount name bic iban subject eref customer_name customer_meta customer_verf customer_amount customer_date MATCH description
8594 05.09.2022 12500 Woody Stanley BYLADEM1001 DE02120300000000202051 monthly rate member 123 Woody Stanley member 123 XO4MFIZ2 12500 12.08.2022 1 metadata match in subject
8661 05.09.2022 5000 Hazel Estrada INGDDEFF DE02500105170137075030 monthly rate member 758 HIYKRHS8 William Estrada member 758 HIYKRHS8 5000 13.08.2022 1 customer_verf equal to eref
8657 05.09.2022 2500 Bill Glover BELADEBE DE02100500000054540402 monthly rate Bill Glover member 122 YQ48CXBT 2500 31.08.2022 1 dates, amounts and names suits
8650 05.09.2022 2500 Rosalind Reynolds CMCIDEDD DE02300209000106531065 monthly rate member 147 Rosalind Reynolds member 147 5AXP4WKC 2500 15.08.2022 1 metadata match in subject
8649 05.09.2022 60000 Isabella Wells HASPDEHH DE02200505501015871393 rate 254 YQ48CXAB Isabella Wells member 254 YQ48CXAB 60000 09.08.2022 1 metadata match in subject
8647 05.09.2022 5000 Sabrina Woodward PBNKDEFF DE02100100100006820101 monthly rate AD7OX0OA Sabrina Woodward member 756 AD7OX0OA 5000 21.08.2022 1 customer_verf equal to eref
8645 05.09.2022 10000 Lulu Moore DAAEDEDD DE02300606010002474689 monthly rate YR3L1C93 Lulu Moore member 635 YR3L1C93 10000 30.08.2022 1 customer_verf equal to eref
8644 05.09.2022 40000 Gilbert Hodgson SOLADEST600 DE02600501010002034304 monthly rate H9219EYX Gilbert Hodgson member 654 H92I9EYX 40000 01.09.2022 1 customer_verf match in subject with typo
8643 05.09.2022 15000 Milton Parker HYVEDEMM DE02700202700010108669 monthly rate SNLF2U1O Milton Parker member 962 SNLF2U1O 15000 20.08.2022 1 customer_verf equal to eref
8641 05.09.2022 30000 Warren Webb PBNKDEFF DE02700100800030876808 monthly rate member 356 Warren Webb member 356 BP0CFA9R 30000 20.08.2022 1 metadata match in subject
8633 05.09.2022 80000 Emmett Todd BEVODEBB DE88100900001234567892 monthly rate 426 RZT9YRV8 Emmett Todd member 426 RZT9YRV8 80000 01.09.2022 1 customer_verf equal to eref
8622 05.09.2022 10000 Noah Wise SSKMDEMM DE02701500000000594937 monthly rate member 444 Noah Wise member 444 LPE7UL1Y 10000 13.08.2022 1 metadata match in subject
8620 05.09.2022 2500 Zoe Malcom OPSKATWW AT026000000001349870 monthly member_number 765 PDXYSV6F Zoe Malcom member 765 PDXYSV6F 2500 15.08.2022 1 customer_verf equal to eref
8794 05.09.2022 12500 Woody Stanley BYLADEM1001 DE02120300000000202051 monthly rate member 123 Woody Stanley member 123 XO4MFIZ2 12500 06.09.2022 0 date earlier than customer_date
8761 05.09.2022 5000 Hazel Estrada INGDDEFF DE02500105170137075030 monthly rate member 758 HIYKRHS8 William Estrada member 758 HIYKRHS8 10000 13.08.2022 0 amount differs
8741 05.09.2022 30000 Warren Webb PBNKDEFF DE02700100800030876808 monthly rate member 536 Barclay Richardson member 356 BP0CFA9R 30000 20.08.2022 0 no match

Hi @zenon,

I can clearly see a pattern from the table where ever the data is a mismatch you directly classify them as 0. First make sure if the problem can be solved using with few if and else statements, because there are fixed number of columns and the MATCH being 0 or 1 is directly related to mismatch in one or the other columns based on a condition. I can see that for date column you have predefined condition, so like that if you any set of rules for each column you can solve the problem easily and achieve 100% accuracy. Let me know if this is helpful and if any further queries we can discuss.

Thanks.

Hi @Siva_Sravana_Kumar_N
thank you for your response. The current implementation uses exactly what you wrote. I check if eref matches customer_verf depending on the amounts and the dates. Unfortunately, some of the data is entered manually and some is converted from handwritten to text. Sometimes eref is missing and users write customer_verf in the subject.
Probably I have not selected the best data as an example. Currently I have a matching rate of about 40%. My goal is to make it a little smarter and thus increase the rate.

My current idea is to turn the strings into tokens and then use them to train an RNN. Do you think this could work?
Thanks!

We notice your response with an unusual hyperlink for the above query. Could you help us to understand the purpose? We are here to help you to resolve your problem. Thank you.

Hi @zenon,

Even with RNN you cannot only consider text as your input because from the data I guess that date, name and amount features as well effect the output. So, while building the model, use TF-IDF for text to tokens and keep the numerical values the same combine all the features and feed it to normal ML models like Logistic Regression or SVM. Once you think that you want to increase accuracy on top of this, you can consider RNN for textual data for converting to a fixed length of features and then combine with numerical features and the make classification. I hope this will help you get started initially.

Thanks & Regards,
Sravana Neeli.