Majority Filtering

I have some image data for three classes. There might be some mislabeled data also in that dataset. I want to filter out the mislabeled one as much as possible. I am thinking about applying Majority Filtering method. The algorithm is like:

  1. I take a subset of the training data.
  2. Train multiple classifiers with the same subset of data.
  3. Predict on a different subset using all the classifiers.
  4. If majority of the classifiers fail to predict a label correctly, I tag it as mislabel and don’t consider it in the next iteration.

Now, I am having trouble managing all those data. I used image_dataset_from_directory method to import all the images. But during the training period the data are shuffled. So, I can’t keep track of which are mislabeled or correctly labeled. Also, I am not sure how to eliminate the mislabeled ones in the next training loop.

Are you using jupyter notebook for this. I think I had the same issue while working with jupyter notebook. I can help you if that is the case.