So, I would want to do predictions using image data of goats. My independent variable was continuous data that I divided into 3 classes to make it categorical: group 1= <30%, group 2= 30% to 70%, and group 3= >70%. However, my worry is that prediction accuracy may be compromised by images from goats with output values close to the border, eg 31%. To address this issue, I would want to use multiple label classification but only for images from goats with percentages within +/- 5% from the border ie 25% to 35% and 65%-75%. I would not want to maintain 3 groups.
My question is that, is this possible. if so, how do I group my training and testing data set, which Multi-label classification method would best fit my problem and how do I go about it
What are the outputs of your model going to look like? From your description, it sounds like you’re going to have one output, and use it’s value between 0 and 1 to determine which group it falls into. Are your classes along a continuous spectrum, or are they distinct and mutually exclusive?
I have one output factor with values from 0 to 1. Like you rightfully said, I will use the output values from each goat to classify the images into 3 distinct classes, i.e. class A for values less than 0.30, class B from 0.30 to 0.70 and class C greater than 0.70. However, because I have grouped my data into 3 distinct and mutually exclusive classes, I’m likely going to have biases for images from goats with percentages such as 0.28, 0.32, 0.69 or 0.73 as they are close to my delineating points and may give predictions belonging to another group. Therefore, I would want to use Multi-label classification for such figures (ie +/- 0.05 from 0.3 and from 0.7). Would this work, if not what other deep learning method would work for such a prediction analysis
Okay. For multi-class classification, you’ll usually have the same number of units in your output layer as you have classes. This means that each output represents the probability that the image falls into a given category. You can then use a softmax activation function to scale your outputs so that they add up to 1. An example of the output that you’d get would be [0.01, 0.01, 0.98] meaning that your model is 98% confident that the image is in class 3.
When training, set up a list of your classes (like [“class1”, “class2”, “class3”]) and have your training labels be the index in that list where the goat should be classified. Set your model up as I described above and use ‘sparse_categorical_crossentropy’ as your loss function. After training, you can use the argmax function in numpy to get the index of the largest value from your prediction list and use the list of class names to give it a text label.
categorical_crossentropy lets you use the actual label values. It’s good for when you have a single category, or you have an input that can fit into multiple classes so one-hot encoding your label won’t help (for example, an image with multiple goats).
sparse_categorical_crossentropy lets you use the index of your class as the label and creates the one-hot array for you. This helps when you have lots of classes and don’t want to have your label be a huge array of 0s with one 1 for the class that you’re looking for.
For sparse categorical cross-entropy, you don’t need to one-hot encode your class labels. It’s handled by the library itself. But for categorical cross-entropy the loss function expects the predictions and the true class labels to be in one-hot encoded form. It is beneficial when you would want to use something like label-smoothing or any other method that modifies the marginal distributions.
What do the results that you’re getting look like? You may want to check that you have a 10-unit dense layer at the end of your network, and that the labels for your results match the 10 classes that you’re looking for.
yeah i have a dense layer of 10 classes the problem am experiencing is that , it is a tomato leaf disease detection model but it is predicting even my face as one of the classes am having .Is there a way to control prediction so that it only predicts tomato leaf disease only and ignore other things
Your model will only know how to classify based on the images that you’ve trained it with. It’s trying to decide which of the classes it recognizes that your input image is most similar to.
To fix this, you might try setting a minimum threshold when handling the model’s output. If your highest class value is below a certain level, then it’s probably safe to say that your model doesn’t recognize any disease in the image.