Majority Filtering

MD_SHARIFUL_ISLAM · March 27, 2024, 4:48am

I have some image data for three classes. There might be some mislabeled data also in that dataset. I want to filter out the mislabeled one as much as possible. I am thinking about applying Majority Filtering method. The algorithm is like:

I take a subset of the training data.
Train multiple classifiers with the same subset of data.
Predict on a different subset using all the classifiers.
If majority of the classifiers fail to predict a label correctly, I tag it as mislabel and don’t consider it in the next iteration.

Now, I am having trouble managing all those data. I used image_dataset_from_directory method to import all the images. But during the training period the data are shuffled. So, I can’t keep track of which are mislabeled or correctly labeled. Also, I am not sure how to eliminate the mislabeled ones in the next training loop.

Ajay_Krishna · March 27, 2024, 5:22am

Are you using jupyter notebook for this. I think I had the same issue while working with jupyter notebook. I can help you if that is the case.