Low Accuracy-High Recall

AhmedOumar · January 7, 2022, 6:21pm

Hey Community, I hope you’re doing great.

I’m working on binary classification using structured data, and my model has gave a great validation recall result(around 80%) but low validation accuracy(around 40%)!
and i like to improve the validation accuracy even if that will slightly decrease the validation recall? any suggestions please.

Thank you so much!

markdaoust · January 10, 2022, 6:27pm

Plotting the “precision vs. recall” curve will show you the performance you can reach just by changing the threshold level. Have a look at this tutorial:

AhmedOumar · January 13, 2022, 8:12am

@markdaoust thank you for your reply.

Samuele_Bortolato · January 13, 2022, 11:04am

How did you train the classifier? 40% accuracy on a binary classification is worse than random (would be 50% if the class are balanced or higher if they are not)
Anyway, combined with changing the trashold, that that is done after you have altready trained the classifier, if you have unbalanced (but that’s usually high accuracy and low recall of the minority) consider oversampling the smaller class or undersampling the other. Alternatively you can weight the loss function (or the gradient); this can also bo done in case predicting well a class is more important than the other.
But I’d check the classifier first.

AhmedOumar · January 13, 2022, 11:36am

Hey @Samuele_Bortolato, the data that i have is unbalanced data(70%/30%) and i’m trying to maximize the precision.
I managed to get a good accuracy, but i like to get good precision and recall results.

Samuele_Bortolato · January 13, 2022, 12:35pm

I don’t get what you mean with “I’m trying to maximize precision”, when you train you simply minimize the loss, unless you wrote a custom loss to only maximize precision (not advised) if you use crossentropy the measure more similar to what you maximize is the weighted accuracy (but even that tecnically is not correct, you minimize the loss, not maximize accuracy or recall).

In your case since you have unblanced data it’s probable that it will learn to classify the bigger class more than the smaller. For example, classifying all instances as the bigger class without undertanding anything of the problem would get it a 70% accuracy.

If you just want to account for the unbalance in the data I would just give the bigger class a weight of 0.3 and the other a weight of 0.7 in the loss function. In this way if it tries to cheat and simply predict always a class it would get a low score. If you are more interested in one class than the other set different weights. (I wouldn’t use a validation_precision ealy stopping as done in the tutorial linked though, for the same reason as before).

Or follow exactly the example tutorial.

Anyway you can simply try, make more classifiers, plot the behaviour of the different classifiers for different trasholds and choose the more suitable.

AhmedOumar · January 14, 2022, 9:43am

sorry for not being clear, what i’m trying beside minimizing the loss is to get the high precision and recall as possible.
I got a good recall results, but the precision still something under 40%.
and thank you for your time @Samuele_Bortolato