Training an model for PDF files and comparing them to exisiting PDF's

Hi everybody!

I just made my first touches with TensorFlow and AI.
I have experience in programming so no worries abou that.
Currently i am working for a company which investigates and creates documents(PDF files) based on these investigations.

However the documents that are being created are consist mostly of the same lay-out. I would like to automate the process of checking those documents and learning an AI model (Tensorflow model if possible) to read all of the existing PDF files (Big data). To then compare them with a newly created document to scan if the document is good without any faults in it (i.e. signatures, spelling mistakes, dates, and the rest of the content).

Then it would either correct the new PDF file or make a small text file with faults in it.

I currently so far that i know that i need to convert the PDF files into images, make an OCR and fixing them.

How would i go about this in big lines? or is this too difficult for an AI model to do?

As of now i have 500+ pdf files as big data.