Hi everyone, this is my first post here.

I implemented a conjugate gradient minimizer in tensorflow. The coefficient matrix is much wider than tall, so the goal is to solve the system without generating the full matrix (i.e. I use K and K.T separately, I don’t actually compute K*K.T). I am not using sparse matrices

It works, but seems to only take advantage of two threads. This is true both on MacOS and Linux.

This function is loaded from a module, it carries out one CG step . SIGMA is the unknown vector :

``````@tf.function
def CGSTEP(K, KT, SIGMA, alpha, pk, pkm1, gkm1, gkm2, rk, rkm1, RHS, resid) :
alpha.assign(tf.norm(gkm1)**2 / tf.norm(tf.matmul(K,pk))**2)
rk.assign(rkm1 - alpha*tf.matmul(K,pk))
gkm2.assign(gkm1)
gkm1.assign(tf.matmul(KT,rk))
pkm1.assign(pk)
rkm1.assign(rk)
resid.assign(RHS - tf.matmul(K,SIGMA))
return
``````

I use the function like this :

``````r0 = tf.Variable(RHS - tf.matmul(K,SIGMA))
resid = tf.Variable(r0)
g0 = tf.matmul(KT,r0)
gkm1 = tf.Variable(g0)
gkm2 = tf.Variable(g0)
rk = tf.Variable(r0)
rkm1 = tf.Variable(r0)
pk = tf.Variable(g0)
pkm1 = tf.Variable(g0)
alpha = tf.Variable(tf.norm(gkm1)**2 / tf.norm(tf.matmul(K,pk))**2)

KMAX = 100
k = 1

while k < KMAX :
if k > 1 :
pk.assign(gkm1 - (- tf.norm(gkm1)**2 / tf.norm(gkm2)**2)*pkm1)
#
CGSTEP(K, KT, SIGMA, alpha, pk, pkm1, gkm1, gkm2, rk, rkm1, RHS, resid)
k += 1
``````

Up front I set

``````tf.config.threading.set_inter_op_parallelism_threads(int(os.environ["NUMEXPR_NUM_THREADS"]))