Video classification with Transformers

Hi folks,

My latest Keras example showing how to build a video classifier by using a hybrid Transformer model. First, we process the video frames using a pre-trained CNN and then we use a Transformer-based model to operate on the CNN feature maps for modeling the temporal relationships.

Here’s what you can expect to get as the results (above are the predictions and below is a GIF of the input video):


P.S.: I am a Cricket fanatic. Sachin Tendulkar is my favorite batsman and Shane Warne is my favorite bowler. :smile: