Hi! Time to learn some AI - lets build a coffee roast predictor

Hi all!

I decided it’s time I put some effort into learing something about AI/ML.

As such I’ve decided to build a “coffee roasting predictor”.
i.e: start my roaster, and have I tell me when it should be stopped.

I did a post on home-barista.com that shows the machine I’m dealing with.

These are the inputs I think I can use (for a regression, maybe?):

  • audio (primarily to get 1st crack, 2nd crack times)
  • video (color of beans)
  • temp of chamber (read the digits from the LCD display)
  • ambient temperature

From this, I’d like to end up with a model that can monitor the roaster, and predict when I should stop it.

I’m new to AI/ML, but have been coding for many years.
Right. Inputs. So far; I’ve got going:

  1. reading the temps from the LCD (cv2 + tesseract + custom trained tesseract model)

  2. transfer training ymnet to recognize background noise, first crack and second crack.

  3. seems is … ‘easy enough’, given enough messing around with the images

  4. tho - more tricky

I’ve attached a first_crack file that has OBVIOUS cracks in it (at 1s, and 8s … as well as noises of the roaster working, heaters going, and so on).
First crack is characterised by very short, somewhat loud … “cracks”.

So far, the confusion matrix is pretty bad. Right now I have only 8 samples of each of the types of data I’m inputting into the model. And those vary in quality. Each is about 5-10s long. I’m not normalizing anything, data is loaded by:

@tf.function
def load_wav_16k_mono(self, filename):
    """ Load a WAV file, convert it to a float tensor, resample to 16 kHz single-channel audio. """
    file_contents = tf.io.read_file(filename)
    wav, sample_rate = tf.audio.decode_wav(
        file_contents,
        desired_channels=1)
    wav = tf.squeeze(wav, axis=-1)
    sample_rate = tf.cast(sample_rate, dtype=tf.int64)
    wav = tfio.audio.resample(wav, rate_in=sample_rate, rate_out=16000)
    return wav

So I’m here to both say “hi!” and also ask for any pointers in my learning.

Audio wise, I’m thinking:

  • I’m aware ymnet split those up into much smaller sizes when training, and that they are converted to 16Khz
  • 16Khz is “hiding” the crack sounds. maybe. They are certainly not as audible if I resample the input file in audacity. So I might need to do a band pass filter, maybe even amplify the crack, or try background noise removal before converting to 16Khz for input into ymnet

other than that? Not really sure.

I’ve produced time series of the temperature data for each roast.
So I think I’ll (soon) have a data series that might have:

  • time to 1st crack
  • time to 2nd crack
  • ambient temp
  • time series data for chamber temp

I figure I’d be using that to train a regression. But I dunno if that’s going to work, given the low sample size. I might end up only have 30 or so samples to work with unless other roasters can supply me data.

Anyway, HELLO!
If anyone could suggest useful directions for my learning, I’d be very grateful!

Be Confused: