How to improve classification prediction models accuracy?

Hi,
I have applied several classification methods, unfortunately, the developed models never exceed 62% of accuracy.

here I attached a comparison table of the developed models.
I’m wondering how I can improve the models’ accuracy!?

The first question i suggest to you is:

  • what are your performances on the training set?

I have split the data into training and testing and the confusion matrix give these results, not sure if is it the right thing

from sklearn.tree import DecisionTreeClassifier
dt=DecisionTreeClassifier()

dt.fit(X_train,y_train)

pred_dt_tr=dt.predict(X_train)

pred_dt=dt.predict(X_test)

from sklearn.metrics import confusion_matrix,classification_report,f1_score

print(confusion_matrix(y_test,pred_dt))

print(classification_report(y_test,pred_dt))

What I meant, I suppose that the table is from the testset.

So what are the models performances on the training set?

This could be a useful starting point to understand if:

  • You still have a margin to learn with the current data

  • You have generalization issues or you model is overfitting

  • Your model capacity is limited

  • Missing hyperparameter tuning on a validation set

  • Etc.

thank you for replying
How can I check the training performance?

I just see the accuracy for training and resting for KNN. the accuracy for training is 0.996 and the testing is 0.722

Using pred_dt_tr and y_train

This forum is generally about Tensorflow but you are using sklearn so I suggest you to use sklearn support channel for sklearn code/projects.

I don’t know your specific learning goal and dataset but in TF you can try to explore:

1 Like

Sorry I apologies if posted something not related to tensorflow policy.

Thank you for replying to me

No prob. Let us know If you have other questions experimenting TF.

It is almost impossible to suggest anything without additional information.

  • How many classes are you trying to predict?
  • How many training images do you have?
  • Does your dataset suffer from data imbalance?
  • Have you already checked your data? (e.g. if it is labeled correctly)

I’m trying confirmed and suspected cases
it is numerical data not images
How can I check for data balance?
the data is labeled

It would be great that you will show two graphics: acc & loss, as a picture …they can be drawn on one picture. It has given some answers to the appeared questions.

unfortunately I only applied Confusion Metrix so I didn’t apply anything else