Data Preprocessing handling

SAHJAD_HOSSAIN · June 24, 2022, 2:31pm

Hello
I am practicing following models to solve Natural Language Processing(NLP) and Computer Vision tasks. Help me to practice and learn Data Preprocessing for these tasks. Could you guys share the names of the some of datasets which help me to gain data preprocessing knowledge? Would it be also possible to share your views about how to proceed to any dataset so that can be handled by the models as shown below?

1. Conv1D

2. Dropout

3. GlobalMaxPooling1D

4. MaxPooling1D

5. LSTM

6. Bidirectional(LSTM)

Thank you in advanced

Atia · June 26, 2022, 10:12pm

Firstly, data preprocessing is a necesaary step to any machine learning work. The reason for this is that during the collection and sometimes transfer of data (to and from data storage location like a server or local disk),discrepancies are introduced in the data. Since machine learning requires that your data be uniform, it is imperative to conduct some form of data preprocessing to get them in the right format for your work. Let me also mention that there is no clear cut formula to it as the kind of preprocessing will depend on the task. Some tasks require heavy preprocessing while others will do just fine with the bare minimum of preprocessing.

Secondly, what you have listed are not models but individual layers that make up your model. Depending on your tasks, the layers will be stacked together in a particular order to achieve the performance you desire (although other things also come in play here). So just as mentioned in the first point, your choice of data preprocessing still boils down to your tasks.

The tutorial section of the tensorflow docs has great resource for practising and learning.You can give that a try first.

ardath_lasell · June 27, 2022, 7:09am

Data Preprocessing
Data preprocessing is the process of transforming raw data into an understandable format. It is also an important step in data mining as we cannot work with raw data. The quality of the data should be checked before applying machine learning or data mining algorithms.

SAHJAD_HOSSAIN · June 29, 2022, 8:29pm

hi,
thank you for your valuable comments.
I am following TF tutorials. I am currently looking at time series classification problems. if you are familiar with multivariable time series data training and prediction, I have been looking some of the interesting examples to practice multivariable time series. Please share

thanks

SAHJAD_HOSSAIN · June 30, 2022, 12:04am

thank you for the valuable comments