DQN network is not learning how to interact with environment and always take the same action

Hi I am a beginner in RL and I fell like my DDQN agent is not learning at all and almost always choosing the same action. I have no idea what I am doing wrong. If you have an idea what I can be doing wrong or have any advice what I can try to make it work better, I will appreciate it a lot.

About the project:


  • board is a rectangle of given width and height with randomly placed food
  • you can move STRAIGHT LEFT or RIGHT depend on the snake head position,
  • if snake head collide with food, then you are getting positive points and snake body is growing
  • if you ate all foods, you are wining
  • if you hit a snake tail or board edge with a head, you are loosing

Points for action

  • if collide with FOOD, gain 100 POINTS
  • if collide with BLANK field, gain -100 POINTS
  • if collide with TAIL, gain -10 000 POINTS
  • if collide with WALL, gain -10 000 POINTS
  • if all foods have been eaten gain 1000 POINTS

DQN input

Let’s say I have created 3x3 board with 3 foods which looks like that

Then input of my DQNNetowrk is a vector with 9 values where each value represent current state of board cell, where:

  • Snake Head is 3, 4, 5, 6 depends on direction (BOTTOM = 3, LEFT = 4, RIGHT = 5, TOP = 6)
  • FOOD is 2
  • Snake Body Part different than head is 1

So for above board input vector will be:

  5, 0, 0
  2, 0, 0
  2, 2, 0

Then before I put this to my DQN I am converting this vector to Tensor of rank 2 and shape [1, 9]. When i am training on replay memory, then I am having a Tensor of rank 2 and shape [batchSize , 9] .

DQN Output

My DQN output size is equal to the total number of actions I can take in this scenario 3 (STRAIGHT, RIGHT, LEFT)


Every time before I start to use my DDQN agent, first I am creating onlineNetwork and targetNetwork and filling ReplayMemory buffer. Then at the beginning of every epoch I reset my Enviroment and then I then my agent is playing in environment until lose or win. After every action agent will take, I am learning on replay memories batch and increasing copyWeights counter (once this counter is a multiple of updateWeightsIndicator). At the end of every epoch, I am adding new game score to my buffer with 100 last results and decreasing epsilon.

What I have tried

  • normalize my input vector to values between 0 and 1
  • create a vector with only foods marked
  • create a vector where I have used rewards to replace snake boy parts positions and food
  • add two additional columns and rows and fill them with WALL reward
  • changing size of hidden dense layers

None of the above helped


Under this link: GitHub - kaczor6418/snake-game: Simple snake game created to learn some about WebGL and reinforcement learning algorithms, you can find my GitHub repository. I think the most possible places I have made a mistake is ReinforcementAgent class or DoubleDeepQLearningAgent class inside src/agents directory. In App.ts runSnakeGameWithDDQLearningAgent you can find configuration of my DDQN Agent.

To use this repository:

git clone https://github.com/kaczor6418/snake-game.git
cd ./snake-game
npm install
npm run serve

Or maybe I am just not patient enough and 10k epochs(10k played game is not enough to learn such environment ?)

If you need any additional explanation, please let me know in comments and I will answer