We have a deep reinforcement training routine with a paralleled evaluation process. Both the training process and the evaluation process have their own models, which are instances of the same class that is inherited from keras.model. While we encountered a strange issue:
With some model classes, the code works fine. But with another model class, the model instance in the subprocess won’t work, while another instance in the main process works well. Specifically, in the subprocess,
model(input) just stuck without any reaction and any raised errors.
I don’t know whether is this a multiprocessing issue, a model defining issue, or any other potential issue. If it is a multiprocessing issue, I can’t explain why the issue won’t happen with other model classes. If it is a model-defining issue, I don’t know why the model instance in the main process works without any abnormal.
Does anyone have any ideas on this issue? Or any ideas on how to debug this issue. Many thanks!