Tell me your funny (SFW) Machine Learning stories?

Would love to hear funny stories about your ML implementations. What kind of bugs have you encountered?

One of my favorites – not sure if it’s true or not – was that a few years back the US Army wanted to build a computer vision model to detect camouflaged tanks. They got some data scientists to build a model, and these folks got to borrow a tank for a couple of days, drive it around the woods, and take lots of pictures. One day they did it without camo and labelled it. The next day they got the camo nets and did the same. They built a model with these pictures, and did everything right – holding back a portion for testing and validation. When they were done, their model was incredibly accurate. They had succeeded!

Then they went into the field to test it. And it failed, miserably. They couldn’t figure out why. Their test set and validation sets were properly selected and randomized. It should work.

And then somebody pointed out that the weather was sunny on day 1 (no camo), and cloudy on day 2 (with camo) , so instead of a camo detector, they had actually built a cloudy sky detector instead…



This reminds me of an actual experience I had a few years ago. I was training a reinforcement learning robot to navigate a (simulated) maze. This was for a competition at my university. The organizers provided us with an API where we would get a maze layout and the position of the robot and several exit points and “treasure” locations. Goal was to have robot leave the maze before a timer expired but the ranking was based on how much treasure you got.

After several hours of training, I got a robot which could navigate all the mazes that were generated by the API, getting good exploration-exploitation balance. Then, we decided to train the pipeline so that instead of using the maze as provided by the API we used a rotation of it. Turns out the robot was getting stuck most of the time.

The reason? The maze generation algorithm that was provided in the API was biased to mazes that had long horizontal corridors and very short passages to next rows. So our robot was overfitting on this feature.

It was easy to fix and it turns out the organizers also fixed this in the generator for the actual competition. So only a few number of robots managed to score points during the actual event, as those teams put extra care towards preventing overfitting to the training data.


Love it! Overfitting for the win. Or loss. :wink:

1 Like

Wow that’s gonna be my favorite thread :smile:

When TensorFlow was opensource for the first time in 2015, I got really excited about it, but I even couldn’t compile it after one-week of effort. Then I finally gave up until I devoted one weekend to get up and running with it. Prior to it I was a big fun of node.js and did crazy things like writing really ugly ML code in JS. My first project with TF was a custom clothe color recognition for personal use (I’m blind), so I scraped images of clothing with descriptions from popular ecommerce websites, quickly labelled them with some regex matching color names and trained my model. Everything was great until I naively tested it in the real world and learned (with sorrow, tears and blood) about the challenges of “ML in the wild” vs. a controlled environment, and the necessity that train and test datasets should be sampled from the same distribution.


Preprocessing utilities have always bugged me :expressionless:

So, back in the days, there was nothing like tf.keras.layersexperimental.preprocessing.Rescaling. In order to better streamline the preprocessing steps, I was wrapping my utilities inside a Lambda layer and then was interesting that layer into my final model. As expected, the model’s performance was terrible when it was compiled as a graph. So, we decided to do the preprocessing externally in a separate job.

During the staging, I had mistakenly altered a single digit in our preprocessing utility and this led our model to predict everything to all the same. It took us ~4 days to figure out what was going wrong (yes, I acknowledge our codebase wasn’t organized that well). But I learned my lesson.

Fast-forwarding to later 2020 when TensorFlow released the preprocessing layers it really came as a relief :smiley:


We had a large piece of work that was funded partially by the fact that it “introduced machine learning”. So we did what I always do; start with the introduction of an evaluation system; the goal being to baseline the incumbent deterministic business rules. Just doing the evaluation meant that we discovered some problems with upstream data which meant we could improve things overall just by a data processing change. This was enough to actually move onto something else for a bit. When we came back to it later the next thing was to improve the business rules, still without introducing anything anyone would call ML. I’ll never forget after that release a key exec stakeholder pulling me aside and asked “are we doing machine learning yet?”. It always makes me laugh. So much hype that for them the key result wasn’t a better user outcome, it was whether we were “using machine learning” :smiley: