Encoding Cyclical Features?

rcauvin · May 19, 2023, 7:27pm

A cyclical feature is one that has ordered values that cycle. Day of week is an example; you can express it as a value from 1 (Monday) to 7 (Sunday), where it cycles every 7 days. For purposes of training a model, we want 7 (Sunday) to be just as close to 1 (Monday) as 2 (Tuesday) is. We can do so by transforming the 1 to 7 integer value to a sine and cosine pair.

One way to accomplish the representation of day of week as a sine and cosine pair is to simply replace the day_of_week feature in the input data with two features: day_of_week_sin and day_of_week_cos. But is there a way to do it in a preprocessing layer so that the input is still day_of_week, but the preprocessing layer takes care of splitting it into the sine and cosine pair?

It seems having a preprocessing layer “hide” the encoding of cyclical features makes it easier for clients of the model to train and invoke it.

rcauvin · May 21, 2023, 7:52pm

It appears I can create an instance of this custom layer class that takes an integer as input and outputs the sine and cosine as floats.

class CyclicalIntegerLayer(tf.keras.layers.Layer):
    
  def __init__(
      self,
      start: int,
      end: int,
      **kwargs):
    
    super(CyclicalIntegerLayer, self).__init__(**kwargs)
    
    self.start = start
    self.end = end
    self.input_dim = 1
    self.output_dim = 2
    

  def call(
      self,
      inputs):
    
    count = self.end - self.start + 1
    offset_inputs = tf.math.floormod(tf.subtract(tf.cast(inputs, tf.float32), self.start), count)
    scaled_inputs = tf.math.divide(tf.math.multiply(2 * np.pi, offset_inputs), count)
    sin_outputs = tf.math.sin(scaled_inputs)
    cos_outputs = tf.math.cos(scaled_inputs)
    outputs = tf.stack([sin_outputs, cos_outputs], axis = -1)
    
    return outputs

  def get_config(self):
        
    config = super().get_config()
    
    config.update({"start": self.start, "end": self.end, "input_dim": self.input_dim, "output_dim": self.output_dim})
    
    return config

To encode a cyclical day_of_week feature ranging from 1 to 7, we can instantiate the preprocessing layer:

day_of_week_layer = CyclicalIntegerLayer(start = 1, end = 7, name = "cyclical")

Feedback on pros and cons is welcome.

Mog · May 30, 2023, 10:04am

This is very nice and I thought about building a preprocessing layer for date-time myself. But months are annoying since they vary in length. Weeks are nice because they are always 7 days.