Hello. I am trying to create a classification model, but my data has very high variance.
There are 3 Classes and 7 Categories for each of them (1,2,3,4,5,6,7), and after analyzing my model,
it seems impossible to go anywhere higher than 30% Accuracy if it attempts to predict the exact value.
So I created a custom metric, which measures accuracy by grouping the 7 categories into only 3 to
make it easier for the model. 1,2,3 are grouped, 5,6,7 are grouped, and theres 4.
So, I want my model to default to an output (4 in this case) if none of the input variables are enough of a clue to predict the category of each class. How would I do that?
My current idea is to add 1 for correct predictions, 0 for incorrect ones, but still add 0.5 when 4 is selected, to make my model realize it is safe to keep outputting 4 if it is not sure, but still make it
risk having to guess when it believes it has enough clues.
It did not work well. Any ideas? I will be thankful for any attempt to help.
The idea sounds good, but there is a problem with post-processing.
The reason I wanted it to return a default value is not for the output per-se, but more for the model to try to ignore the high amount of noise in the data, and only change the weights when it is sure it has high enough correlation with the data.
I have around 17 variables, and the output are 3 scores that go from 1 - 7.
Here I have assumed one of the variables doubles the chance of a bad score.
The default chance of guessing the correct score groups (1,2,3/4/5,6,7) is 1/3, and if the variable is contained, it increases to 1/2 (2/4).
This model has no way to go anywhere above 50% accuracy without overfitting.
That is why I wanted it to default to a value. I thought that if I condition the model to return a default value when it is not sure, I could reduce overfitting while also at the same time increasing precision.
But that gave me an idea. perphaps I am not preprocessing the data enough before feeding it to the model. I was too focused on the actual model.