I am curious about how to set the Device in TF. I want to implement a custom distributed data parallel algorithm, and I want to say , for example, split input tensor x into three parts and transfer it to three devices.
so basically, I want to
x0, x1, x2 = tf.split(x, num_or_size_splits=3, axis=1)
x0 = x0.to('device:0')
x1 = x1.to('device:1')
x2 = x2.to('device:2')
But this seems quite impossible in TF.
I found one is about colocation_graph, should I use that?
1 Like
lgusm
June 17, 2021, 4:14pm
#2
you can do that using the with(device)
does it help?
1 Like
Thanks for the reply, sorry for this unclear question. The with
context manager only work for python, IMHO.
However, if I want to implement a data parallel, I would have to rewrite the default TF’s pass, in that case, how would I handle this in C++? Because as far as I know, TF’s tensor does not have the device’s information.
1 Like
lgusm
June 18, 2021, 9:43am
#4
Humm, I don’t know.
Is this for the training step?
I lack the background but maybe this Distributed training with TensorFlow | TensorFlow Core might be able to give some insights
1 Like
Bhack
June 18, 2021, 2:11pm
#5
Are you looking for creating your own custom distributed strategy?
Cause I don’t think that we officially support this:
opened 02:14PM - 09 Sep 19 UTC
closed 10:51AM - 25 Jun 20 UTC
type:feature
comp:dist-strat
TF 1.14
<em>Please make sure that this is a feature request. As per our [GitHub Policy](… https://github.com/tensorflow/tensorflow/blob/master/ISSUES.md), we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:feature_template</em>
**System information**
- TensorFlow version (you are using): 1.14.0
- Are you willing to contribute it (Yes/No): Yes
**Describe the feature and the current behavior/state.**
As Tensorflow stands, there is no easy and intuitive way to implement a new distribution strategy. The ones available, (MirroredStrategy, MultiWorkerMirroredStrategy, ...), work fine but the code seems very complex, and there isn't a tutorial/guide on how to develop a new one
**Will this change the current api? How?**
The api cloud be restructured to ease the support of new distributed strategy. A tutorial/guide on how to develop one would also be appreciated
**Who will benefit with this feature?**
Researchers who want to reduce the time spent training distributed tensorflow models
**Any Other info.**
2 Likes
Thanks for the reply! Yes, I am trying to create my own custom distributed strategy, but it seems that doing this in TF is causing a lot of trouble…
1 Like