Connection between Agents and Policies

Stefan_Haerter · June 2, 2021, 12:08pm

Hi everybody,

despite searching on google and the forum, I wasn’t able to find a good explanation on how to initialize an agent with a custom policy. How do I do that? Does anybody have a hint?

Greetings,
Stefan

Bhack · June 2, 2021, 12:34pm

Have you already seen this tutorial?

Stefan_Haerter · June 2, 2021, 12:48pm

Yes - I tried to follow “Example 3: Q Policy”, which works so far. But where in this tutorial is the agent located?

[Thank you very much for your time and sorry for such maybe dumb questions]

Bhack · June 2, 2021, 2:31pm

Policies can be created independently of agents. E.g. see the random_policy in DQN tutorial

If you are going to create your custom Agent policy and collect_policy are constructor args:

8bitmp3 · June 2, 2021, 2:45pm

No problem. Deep RL is not easy.

Your agent [an abstract term, so it’s just your program] would try to learn to do X by interacting with an environment (e.g. the game of Pong via Gym or an Android app via AndroidEnv using a policy (approximated by a neural net - hence “deep” RL) to gain experience. The policy (your neural net) “belongs” to your agent - it maps the agent’s observations (inputs, could be image pixels or sequential data directly via an API) to its actions/action log probabilities (outputs).

In addition, you may find this post - Deep Reinforcement Learning With TensorFlow 2.1 | Roman Ring - helpful. It was written by an engineer who now works at DeepMind. In addition, there’s a YouTube channel that teaches well-known "basic deep RL methods - such as DQN, policy gradients, and actor-critic methods - with core TensorFlow 2. Check out Everything You Need To Master Actor Critic Methods | Tensorflow 2 Tutorial - YouTube (actor-critic methods) or Deep Q Learning With Tensorflow 2 - YouTube (DQN).

Stefan_Haerter · June 2, 2021, 3:00pm

Thank you, the second one seems to be what I am looking for. I tried to code with the dqn_agent and wondered why there was no policy arg.

Stefan_Haerter · June 2, 2021, 3:01pm

Thank you for the explanation and the links. I already wrote a custom environment and now I want to write a custom policy as well. Let’s say I have a use case like the following:
There are a number of boxes and a number of pieces. The task of the agent is to choose a piece and sort it into a box so that the load of the fullest box is minimized. I will have a look at the links.

8bitmp3 · June 2, 2021, 3:40pm

You could also write an “MVP” using, for example, an SAC (by BAIR) algorithm with just TF Probability and TF as a start.

Here’s a “clean” example:

github.com

philtabor/Youtube-Code-Repository/blob/master/ReinforcementLearning/PolicyGradient/SAC/tf2/networks.py

import os
import numpy as np
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow_probability as tfp
from tensorflow.keras.layers import Dense 

class CriticNetwork(keras.Model):
    def __init__(self, n_actions, fc1_dims=256, fc2_dims=256,
            name='critic', chkpt_dir='tmp/sac'):
        super(CriticNetwork, self).__init__()
        self.fc1_dims = fc1_dims
        self.fc2_dims = fc2_dims
        self.n_actions = n_actions
        self.model_name = name
        self.checkpoint_dir = chkpt_dir
        self.checkpoint_file = os.path.join(self.checkpoint_dir, name+'_sac')

        self.fc1 = Dense(self.fc1_dims, activation='relu')
        self.fc2 = Dense(self.fc2_dims, activation='relu')

This file has been truncated. show original

(Another example with code: Deep Reinforcement Learning: Playing CartPole through Asynchronous Advantage Actor Critic (A3C) with tf.keras and eager execution — The TensorFlow Blog)

Bhack · June 2, 2021, 3:59pm

Is this just online 3d bin packing?

Stefan_Haerter · June 3, 2021, 7:10am

In fact, I attempt to solve a Flexible Job Shop Scheduling Problem. I try to model it in a way that the machines are the “boxes” and the tasks of the jobs are the pieces which are to be sorted in the boxes. I want to test if that works.

Stefan_Haerter · June 3, 2021, 7:11am

Thank you very much, I will take a look at this code.

Bhack · June 3, 2021, 9:23am

You can take a look at

https://arxiv.org/abs/2104.08196