How much RAM needed to train simple DQN

mock789 · June 7, 2023, 3:30am

I am trying to train a simple agent to play Mountain_Car from openai gym, but the training is getting into OOM, because the usage of RAM is getting to much. I wonder how much RAM i should expect to be needed, as even with 64GB it was not enough.

Laxma_Reddy_Patlolla · June 7, 2023, 6:32pm

Hi @mock789 ,

The amount of RAM needed to train a simple DQN depends on a number of factors, including the size of the state space, the size of the action space, and the number of parameters in the model. However, as a general rule of thumb, you will need at least 4GB of RAM to train a simple DQN. If you are using a large state space or a large action space, you may need more RAM.

Training a simple DQN for the MountainCar environment should generally not require an excessive amount of RAM. With 64GB of RAM, you should have more than enough memory available for training.

If you are encountering out-of-memory (OOM) issues even with 64GB of RAM, it is possible that there is a memory leak or inefficient memory usage in your code. Make sure to deallocate unnecessary variables, release memory after use, and avoid any memory leaks.

You can try the following approaches to reduce memory usage during training:

Use a smaller replay buffer or limit its maximum size.
Decrease the batch size used for training.
Optimize the neural network architecture to reduce the number of parameters.
Use memory-efficient data types, such as float32 instead of float64.
Consider using techniques like frame skipping or state downsampling to reduce the dimensionality of observations.

I hope this helps!

Thanks.

mock789 · June 7, 2023, 9:32pm

Hi Laxma,

thanks very much for your response!!! That’s strange because i already tried all these things. I am using this script following below and at the moment by using the memory_profiler library, i already figured out, that this line of code alone

action=np.argmax(self.trainNetwork.predict(state)[0])

is responsible for a lot of RAM usage. The script looks like super basic and even if i reduced it, so only the above line of code will be called frequently, it is already running out of memory.

Do you maybe have any script for an openai gym example which i could use, to see if i will run into the same problem.

Btw. this is the script which i run at the time:

import gym
from keras import models
from keras import layers
from keras.optimizers import Adam
from collections import deque
import random
import numpy as np

class MountainCarTrain:
def init(self,env):
self.env=env
self.gamma=0.99
self.epsilon = 1
self.epsilon_decay = 0.05
self.epsilon_min=0.01
self.learingRate=0.001
self.replayBuffer=deque(maxlen=20000)
self.trainNetwork=self.createNetwork()
self.episodeNum=400
self.iterationNum=201 #max is 200
self.numPickFromBuffer=32
self.targetNetwork=self.createNetwork()
self.targetNetwork.set_weights(self.trainNetwork.get_weights())

def createNetwork(self):
    model = models.Sequential()
    state_shape = self.env.observation_space.shape
    model.add(layers.Dense(24, activation='relu', input_shape=state_shape))
    model.add(layers.Dense(48, activation='relu'))
    model.add(layers.Dense(self.env.action_space.n,activation='linear'))
    # model.compile(optimizer=optimizers.RMSprop(lr=self.learingRate), loss=losses.mean_squared_error)
    model.compile(loss='mse', optimizer=Adam(lr=self.learingRate))
    return model

def getBestAction(self,state):
    self.epsilon = max(self.epsilon_min, self.epsilon)
    if np.random.rand(1) < self.epsilon:
        action = np.random.randint(0, 3)
    else:
        action=np.argmax(self.trainNetwork.predict(state)[0])
    return action

def trainFromBuffer(self):
    if len(self.replayBuffer) < self.numPickFromBuffer:
        return

    samples = random.sample(self.replayBuffer,self.numPickFromBuffer)

    states = []
    newStates=[]
    for sample in samples:
        state, action, reward, new_state, done = sample
        states.append(state)
        newStates.append(new_state)

    newArray = np.array(states)
    states = newArray.reshape(self.numPickFromBuffer, 2)

    newArray2 = np.array(newStates)
    newStates = newArray2.reshape(self.numPickFromBuffer, 2)

    targets = self.trainNetwork.predict(states)
    new_state_targets=self.targetNetwork.predict(newStates)

    i=0
    for sample in samples:
        state, action, reward, new_state, done = sample
        target = targets[i]
        if done:
            target[action] = reward
        else:
            Q_future = max(new_state_targets[i])
            target[action] = reward + Q_future * self.gamma
        i+=1

    self.trainNetwork.fit(states, targets, epochs=1, verbose=0)


def orginalTry(self,currentState,eps):
    rewardSum = 0
    max_position=-99

    for i in range(self.iterationNum):
        bestAction = self.getBestAction(currentState)

        #show the animation every 50 eps
        if eps%50==0:
            env.render()

        new_state, reward, done, info, _ = env.step(bestAction)

        new_state = new_state.reshape(1, 2)

        # # Keep track of max position
        if new_state[0][0] > max_position:
            max_position = new_state[0][0]


        # # Adjust reward for task completion
        if new_state[0][0] >= 0.5:
            reward += 10

        self.replayBuffer.append([currentState, bestAction, reward, new_state, done])

        #Or you can use self.trainFromBuffer_Boost(), it is a matrix wise version for boosting 
        self.trainFromBuffer()

        rewardSum += reward

        currentState = new_state

        if done:
            break

    if i >= 199:
        print("Failed to finish task in epsoide {}".format(eps))
    else:
        print("Success in epsoide {}, used {} iterations!".format(eps, i))
        self.trainNetwork.save('./trainNetworkInEPS{}.h5'.format(eps))

    #Sync
    self.targetNetwork.set_weights(self.trainNetwork.get_weights())

    print("now epsilon is {}, the reward is {} maxPosition is {}".format(max(self.epsilon_min, self.epsilon), rewardSum,max_position))
    self.epsilon -= self.epsilon_decay

def start(self):
    for eps in range(self.episodeNum):
        currentState=env.reset()[0].reshape(1,2)
        self.orginalTry(currentState, eps)

env = gym.make(‘MountainCar-v0’)
dqn=MountainCarTrain(env=env)
dqn.start()

Laxma_Reddy_Patlolla · June 7, 2023, 11:15pm

Hi @mock789 ,

Can you please try incorporating this modification into your code and observe if it helps in mitigating the memory issues you are facing:

mock789:

def getBestAction(self,state):
    self.epsilon = max(self.epsilon_min, self.epsilon)
    if np.random.rand(1) < self.epsilon:
        action = np.random.randint(0, 3)
    else:
        pred = self.trainNetwork.predict(state)
        action = np.argmax(pred[0])
    return action

By directly assigning the result of self.trainNetwork.predict(state) to pred and then accessing the maximum index, we can avoid unnecessary memory allocations.

I hope this helps!

Thanks

mock789 · June 7, 2023, 11:47pm

Hi Laxma,

thank you very much for your reply!!! I am letting the script run at the moment, but i already have the impression that it will not fix the problem finally. I am using the memory profiler and i can see that this line of code

pred = self.trainNetwork.predict(state)

is incrementing the RAM about 0.2 MiB everytime it is called. It is really weird and i still see in the task manager how RAM for the Vmmem is growing steadily.

I really have no clue why tensor is not giving this RAM back to the system ://

Line # Mem usage Increment Occurrences Line Contents

44    891.2 MiB    891.2 MiB           1       @profile   
45                                             def getBestAction(self,state):
46    891.2 MiB      0.0 MiB           1           self.epsilon = max(self.epsilon_min, self.epsilon)
47    891.2 MiB      0.0 MiB           1           if np.random.rand(1) < self.epsilon:
48                                                     action = np.random.randint(0, 3)
49                                                 else:
50    891.4 MiB      0.2 MiB           1               pred = self.trainNetwork.predict(state)
51    891.4 MiB      0.0 MiB           1               action = np.argmax(pred[0])
52    891.4 MiB      0.0 MiB           1               gc.collect()
53    891.4 MiB      0.0 MiB           1           return action

Laxma_Reddy_Patlolla · June 8, 2023, 4:19am

Hi @mock789,

Please give this modification a try and see if it helps in reducing the memory growth.
tf.keras.backend.clear_session() instead of gc.collect(). This will release the memory associated with the graph and start a clean session for the next prediction.

I hope this last try can resolve your issue.

Thanks.

mock789 · June 9, 2023, 11:51am

Thank you Laxma,

this really seems to mitigate the problem.

Line # Mem usage Increment Occurrences Line Contents

44    456.6 MiB    456.6 MiB           1       @profile   
45                                             def getBestAction(self,state):
46    456.6 MiB      0.0 MiB           1           self.epsilon = max(self.epsilon_min, self.epsilon)
47    456.6 MiB      0.0 MiB           1           if np.random.rand(1) < self.epsilon:
48                                                     action = np.random.randint(0, 3)
49                                                 else:
50    458.7 MiB      2.1 MiB           1               pred = self.trainNetwork.predict(state)
51    458.7 MiB      0.0 MiB           1               action = np.argmax(pred[0])
52    456.8 MiB     -2.0 MiB           1               tf.keras.backend.clear_session() 
53    456.8 MiB      0.0 MiB           1           return action

Strangely it does not always work as you can see here, where tf.keras.backend.clear_session() is not cleaning up the increment caused by pred = self.trainNetwork.predict(state) :

Line # Mem usage Increment Occurrences Line Contents

44    456.8 MiB    456.8 MiB           1       @profile   
45                                             def getBestAction(self,state):
46    456.8 MiB      0.0 MiB           1           self.epsilon = max(self.epsilon_min, self.epsilon)
47    456.8 MiB      0.0 MiB           1           if np.random.rand(1) < self.epsilon:
48                                                     action = np.random.randint(0, 3)
49                                                 else:
50    457.2 MiB      0.4 MiB           1               pred = self.trainNetwork.predict(state)
51    457.2 MiB      0.0 MiB           1               action = np.argmax(pred[0])
52    457.2 MiB      0.0 MiB           1               tf.keras.backend.clear_session() 
53    457.2 MiB      0.0 MiB           1           return action

So the usage of RAM is still increasing, but much slower.

Anyway, thank you a lot!!! I could run the reduced script the first time over 400 episodes without getting OOM

Now i will try how it works when i try to run the complete script!

Have a nice day

mock789 · June 11, 2023, 1:23pm

Hi Laxma,

just wanted to give an update, that even with the complete script i can now run more than 500 episodes without running into OOM. I really owe you a beer/tea/coca cola

Best

Laxma_Reddy_Patlolla · June 12, 2023, 6:34am

Hi @mock789,

That’s great news! I’m glad to hear that the complete script is working for you. I’m happy to help in any way that I can.

Happy coding!

Thanks.

Dennis · June 12, 2023, 9:11am

Hello @mock789 ,

in the first script above, I’ve noticed

self.trainNetwork=self.createNetwork()

When calling predict, basically a new Sequential Model is returned every time?

action=np.argmax(self.trainNetwork.predict(state)[0])

If you like, you can also customize your train steps (very handy for RL).

Feel free to have a look at the Tensorflow DQN Libary (e.g: compare/benchmark the RAM usage).

Lucky Rewards,
Dennis