Determining topics in text documents

I want to determine topics from text data. Is Tensorflow appropriate for ML for text data? I am new to this but I dont see much modeling of text data as opposed to image data.

Yes, you can use layers like tf.keras.layers.TextVectorization to “tokenize” text and create embeddings. You could, for example, train a model with text as an input feature and topics as the target variable to predict.

1 Like

There are many text-processing tookits that include topic determination apps. One is: https://www.nltk.org/ also, there is WEKA.

Usually, determining topics is an “unsupervised” learning technique. Tensorflow and Keras examples concentrate on “supervised” learning, where you have labeled data. If you want to analyze data and learn about it, unsupervised learning techniques are generally used. For example, if you just have, say, a corpus of newspaper articles and want to find out what each article is about. This is a series of blog posts I wrote about analyzing newspaper articles using Latent Semantic Analysis:

What you are talking about requires designing an all-new deep learning model, which is not a beginner task. But, this would be a really cool example blog post for someone to do using Tensorflow/Keras.

Thank you for the solid answer to my question. I looked over your blogspot article and I see you are only a decade! ahead of me :blush: Now rolling up my sleeves to learn…

Happy to help! It’s been an odd little hobby for me for years. I think that series helped get me hired at a previous job, so it was worth the effort.

1 Like