Multi-label classification

LawJ · June 10, 2021, 3:31pm

Hello good people.

So, I would want to do predictions using image data of goats. My independent variable was continuous data that I divided into 3 classes to make it categorical: group 1= <30%, group 2= 30% to 70%, and group 3= >70%. However, my worry is that prediction accuracy may be compromised by images from goats with output values close to the border, eg 31%. To address this issue, I would want to use multiple label classification but only for images from goats with percentages within +/- 5% from the border ie 25% to 35% and 65%-75%. I would not want to maintain 3 groups.

My question is that, is this possible. if so, how do I group my training and testing data set, which Multi-label classification method would best fit my problem and how do I go about it

Jeff_Corpac · June 11, 2021, 1:22am

What are the outputs of your model going to look like? From your description, it sounds like you’re going to have one output, and use it’s value between 0 and 1 to determine which group it falls into. Are your classes along a continuous spectrum, or are they distinct and mutually exclusive?

LawJ · June 11, 2021, 3:09pm

I have one output factor with values from 0 to 1. Like you rightfully said, I will use the output values from each goat to classify the images into 3 distinct classes, i.e. class A for values less than 0.30, class B from 0.30 to 0.70 and class C greater than 0.70. However, because I have grouped my data into 3 distinct and mutually exclusive classes, I’m likely going to have biases for images from goats with percentages such as 0.28, 0.32, 0.69 or 0.73 as they are close to my delineating points and may give predictions belonging to another group. Therefore, I would want to use Multi-label classification for such figures (ie +/- 0.05 from 0.3 and from 0.7). Would this work, if not what other deep learning method would work for such a prediction analysis

Jeff_Corpac · June 11, 2021, 5:42pm

Okay. For multi-class classification, you’ll usually have the same number of units in your output layer as you have classes. This means that each output represents the probability that the image falls into a given category. You can then use a softmax activation function to scale your outputs so that they add up to 1. An example of the output that you’d get would be [0.01, 0.01, 0.98] meaning that your model is 98% confident that the image is in class 3.

When training, set up a list of your classes (like [“class1”, “class2”, “class3”]) and have your training labels be the index in that list where the goat should be classified. Set your model up as I described above and use ‘sparse_categorical_crossentropy’ as your loss function. After training, you can use the argmax function in numpy to get the index of the largest value from your prediction list and use the list of class names to give it a text label.

LawJ · June 13, 2021, 6:14am

thank you so much Jeff. What are the advantages of using sparse categorical cross-entropy over categorical cross entropy?

Jeff_Corpac · June 13, 2021, 6:36am

categorical_crossentropy lets you use the actual label values. It’s good for when you have a single category, or you have an input that can fit into multiple classes so one-hot encoding your label won’t help (for example, an image with multiple goats).

sparse_categorical_crossentropy lets you use the index of your class as the label and creates the one-hot array for you. This helps when you have lots of classes and don’t want to have your label be a huge array of 0s with one 1 for the class that you’re looking for.

Sayak_Paul · June 13, 2021, 6:37am

For sparse categorical cross-entropy, you don’t need to one-hot encode your class labels. It’s handled by the library itself. But for categorical cross-entropy the loss function expects the predictions and the true class labels to be in one-hot encoded form. It is beneficial when you would want to use something like label-smoothing or any other method that modifies the marginal distributions.

LawJ · June 13, 2021, 7:28am

oh ok, thank you so much for the help

LawJ · June 13, 2021, 7:42am

thank you for shedding some light

DANCAN_SIKUKU · June 17, 2021, 12:37pm

Hello sir , i have trained my 10 classes with CNN but the model is predicting anything outside my classes , is there a way to fix this so that the model predicts only my 10 classes

Jeff_Corpac · June 17, 2021, 5:24pm

What do the results that you’re getting look like? You may want to check that you have a 10-unit dense layer at the end of your network, and that the labels for your results match the 10 classes that you’re looking for.

DANCAN_SIKUKU · June 18, 2021, 6:44am

yeah i have a dense layer of 10 classes the problem am experiencing is that , it is a tomato leaf disease detection model but it is predicting even my face as one of the classes am having .Is there a way to control prediction so that it only predicts tomato leaf disease only and ignore other things

Jeff_Corpac · June 18, 2021, 8:07am

Your model will only know how to classify based on the images that you’ve trained it with. It’s trying to decide which of the classes it recognizes that your input image is most similar to.

To fix this, you might try setting a minimum threshold when handling the model’s output. If your highest class value is below a certain level, then it’s probably safe to say that your model doesn’t recognize any disease in the image.

DANCAN_SIKUKU · June 18, 2021, 8:22am

thank you so much , do you a code snippets for doing that since am using tensorflow to train the model and inferences on android application.
I will appreciate

Jeff_Corpac · June 18, 2021, 4:40pm

Sorry, this would be a post-processing step and I haven’t done much coding around TF Lite yet.

Bhack · June 18, 2021, 4:56pm

If you are working on your first examples you could experiment with the TFlite task library.
E.g. for the image classifier like in your use case there is an easy available score field: