Test basic understanding of computer vision

Hi there,

I just started learning ML and have been watching some videos on TensorFlow. I just wanted to verbalize what I have learnt in the hopes of making sure I understand the concepts correctly. I would appreciate feedback on my thought process.

In YouTube video Basic Computer Vision with ML (ML Zero to Hero - Part 2) with Laurence Moroney, we explored classifying images.

The general idea behind doing this I got was as follows:

  1. We take a dataset and associated labels
  2. We feed a point of data (in this case, an image) into a neural network
  3. In this case, we use 128 functions to process the input data and try to achieve the
    data points associated label. The example used was an image of an ankle boot and
    the number 9 as a label.
  4. The functions start off at some random position, and output some number given the
    input data, check it’s congruency with the associated label and then it passes
    that information to the optimizer.
  5. The optimizer decides how to change the rules in the functions before starting another epoch
    and trying again.
  6. The process repeats itself until all epochs have elapsed, getting better each time.
  7. Each output from the model is a value between 1-10, and in this case, we have 10 clothing items
    so it would be 10 instances of probabilities between 1-10 (this is converted to 1 for ease of use)
  8. This essentially means the model is trained and now when we feed images in, it can give us a
    probability that this image is something, by using model.predict(my_images) (although I’m still not
    sure how this method works).

Thanks for reading, any help appreciated.

Michael

Hi @MTelford

Welcome to the TensorFlow Forum!

You mean you have difficulty to understand the prediction part of the model. Actually the softmax activation used in the model converts the datasets logits into probabilities for the image in 10 values as there are 10 labels in the dataset. Here, prediction is an array of 10 numbers. They represent the model’s “confidence” that the image corresponds to each of the 10 different articles of clothing. You can see which label has the highest confidence value is the label of that predicted image.

Please refer to the model prediction section of the mnist image classification for better understanding. You can also check the each methods and APIs functionality by clicking the attached links to those APIs in the same doc. Thank you.