AoC 2022 in TensorFlow - Day 6. Efficient solution with dataset interleave

pgaleone · December 27, 2022, 4:08pm

Hey folks,

while I’m having fun trying to solve the various puzzles of the Advent of Code 2022 in pure TensorFlow, I’m going on with the write-ups about my solutions.

This time, the article is about the day 6 problem and how I solved it, efficiently and in a few lines, thanks to the tf.data.Dataset.interleave method, that’s the super-hero of the data transformation IMHO.

Any feedback is welcome!

8bitmp3 · December 28, 2022, 9:13pm

You can see the complete solutions in folder 6 in the dedicated GitHub repository (in the 2022 folder): GitHub - galeone/tf-aoc.

Solving problem 6 allowed us to use a very powerful feature of tf.data.Dataset: interleave. In a few lines, this method allows us to define a complete, highly parallel, and efficient data transformation pipeline, that allows us to transform and group data gathered from different datasets. The expressive power of this method, moreover, allowed us to solve the problem in a very elegant way IMHO.

If you missed the article about the previous days’ solutions, here’s a handy list

Advent of Code 2022 in pure TensorFlow - Days 1 & 2.

Advent of Code 2022 in pure TensorFlow - Days 3 & 4.

Advent of Code 2022 in pure TensorFlow - Day 5

@pgaleone

Krish_Yadav · January 2, 2023, 4:03am

Hey I have been following your series for a while now, and I really appreciate it, as it brings out unheard and less used abilities of tensorflow.

pgaleone · January 2, 2023, 9:04am

Thank you for the feedback! FYI I’m going on with the articles and the puzzles

Atia · January 2, 2023, 8:24pm

I see that you use a scan() method for your dataset processing in day 1. Care to throw more light on how it functions? I looked it up on the documentation but it was not explicitly clear and lacked enough examples to make sense of it.

pgaleone · January 3, 2023, 11:36am

The scan method is neat. It allows you to iterate over the elements of a dataset and at the same time carry over a state.

Just think about a simple dataset that produces the values 1, 2, 3, 4, 5.

With the map function, you can loop over every single element of this dataset and apply a transformation.

dataset.map(lambda x: x*2) # 2, 4, 6, 8, 10

With scan instead, you can carry over some information from the previous iterations.

So, for example, if you want to sum up the previous element, you can use the scan method.

initial_state = tf.constant(0)

def scan_fun(old_state, input_element):
    new_state = input_element
    output_element = input_element + old_state
    return new_state, output_element

dataset.scan(initial_state, scan_func) # 1 +0 , 2 + 1, 3 + 2, ...

So you carry over the next iteration, the old_state, and every time you iterate over a new input_element you can generate a new new_state (that becomes the old_state input for the next iteration), and produce (as output) the output_element.

I used scan in several solutions (which is super helpful), I suggest you read all the articles and search inside the code how I used it. I hope it’s helpful for you to understand a little bit more about how to use this great feature.

Atia · January 3, 2023, 5:58pm

This was very helpful. Thank you