Connection between Agents and Policies

Hi everybody,

despite searching on google and the forum, I wasn’t able to find a good explanation on how to initialize an agent with a custom policy. How do I do that? Does anybody have a hint?

Greetings,
Stefan

1 Like

Have you already seen this tutorial?

2 Likes

Yes - I tried to follow “Example 3: Q Policy”, which works so far. But where in this tutorial is the agent located?

[Thank you very much for your time and sorry for such maybe dumb questions]

1 Like

Policies can be created independently of agents. E.g. see the random_policy in DQN tutorial

If you are going to create your custom Agent policy and collect_policy are constructor args:

3 Likes

No problem. Deep RL is not easy.

Your agent [an abstract term, so it’s just your program] would try to learn to do X by interacting with an environment (e.g. the game of Pong via Gym or an Android app via AndroidEnv using a policy (approximated by a neural net - hence “deep” RL) to gain experience. The policy (your neural net) “belongs” to your agent - it maps the agent’s observations (inputs, could be image pixels or sequential data directly via an API) to its actions/action log probabilities (outputs).

In addition, you may find this post - Deep Reinforcement Learning With TensorFlow 2.1 | Roman Ring - helpful. It was written by an engineer who now works at DeepMind. In addition, there’s a YouTube channel that teaches well-known "basic deep RL methods - such as DQN, policy gradients, and actor-critic methods - with core TensorFlow 2. Check out Everything You Need To Master Actor Critic Methods | Tensorflow 2 Tutorial - YouTube (actor-critic methods) or Deep Q Learning With Tensorflow 2 - YouTube (DQN).

3 Likes

Thank you, the second one seems to be what I am looking for. I tried to code with the dqn_agent and wondered why there was no policy arg.

3 Likes

Thank you for the explanation and the links. I already wrote a custom environment and now I want to write a custom policy as well. Let’s say I have a use case like the following:
There are a number of boxes and a number of pieces. The task of the agent is to choose a piece and sort it into a box so that the load of the fullest box is minimized. I will have a look at the links.

3 Likes

You could also write an “MVP” using, for example, an SAC (by BAIR) algorithm with just TF Probability and TF as a start.

Here’s a “clean” example:

(Another example with code: Deep Reinforcement Learning: Playing CartPole through Asynchronous Advantage Actor Critic (A3C) with tf.keras and eager execution — The TensorFlow Blog)

2 Likes

Is this just online 3d bin packing?

2 Likes

In fact, I attempt to solve a Flexible Job Shop Scheduling Problem. I try to model it in a way that the machines are the “boxes” and the tasks of the jobs are the pieces which are to be sorted in the boxes. I want to test if that works.

2 Likes

Thank you very much, I will take a look at this code.

2 Likes

You can take a look at

https://arxiv.org/abs/2104.08196

3 Likes