I am curious about how to set the Device in TF. I want to implement a custom distributed data parallel algorithm, and I want to say , for example, split input tensor x into three parts and transfer it to three devices.
so basically, I want to
x0, x1, x2 = tf.split(x, num_or_size_splits=3, axis=1)
x0 = x0.to('device:0')
x1 = x1.to('device:1')
x2 = x2.to('device:2')
But this seems quite impossible in TF.
I found one is about colocation_graph, should I use that?
June 17, 2021, 4:14pm
you can do that using the with(device)
does it help?
Thanks for the reply, sorry for this unclear question. The
with context manager only work for python, IMHO.
However, if I want to implement a data parallel, I would have to rewrite the default TF’s pass, in that case, how would I handle this in C++? Because as far as I know, TF’s tensor does not have the device’s information.
June 18, 2021, 9:43am
Humm, I don’t know.
Is this for the training step?
I lack the background but maybe this Distributed training with TensorFlow | TensorFlow Core might be able to give some insights
June 18, 2021, 2:11pm
Are you looking for creating your own custom distributed strategy?
Cause I don’t think that we officially support this:
02:14PM - 09 Sep 19 UTC
10:51AM - 25 Jun 20 UTC
<em>Please make sure that this is a feature request. As per our [GitHub Policy](
… https://github.com/tensorflow/tensorflow/blob/master/ISSUES.md), we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:feature_template</em>
- TensorFlow version (you are using): 1.14.0
- Are you willing to contribute it (Yes/No): Yes
**Describe the feature and the current behavior/state.**
As Tensorflow stands, there is no easy and intuitive way to implement a new distribution strategy. The ones available, (MirroredStrategy, MultiWorkerMirroredStrategy, ...), work fine but the code seems very complex, and there isn't a tutorial/guide on how to develop a new one
**Will this change the current api? How?**
The api cloud be restructured to ease the support of new distributed strategy. A tutorial/guide on how to develop one would also be appreciated
**Who will benefit with this feature?**
Researchers who want to reduce the time spent training distributed tensorflow models
**Any Other info.**
Thanks for the reply! Yes, I am trying to create my own custom distributed strategy, but it seems that doing this in TF is causing a lot of trouble…