Conjugate Gradient using tf sparse tensors has worse performances than the same algorithm ScyPy sparse

I implemented a conjugate gradient that works with TensorFlow sparse objects. The algorithm produces the same results as the algorithm implemented with SciPy sparse objects, however, it is 5 times slower. I wonder if someone can help me to understand why and suggest a way to fix this.
The algorithms for SciPy and for TensorFlow are the following:

def conjgrad_scipy(A,b,x0,niter=100,toll=1.e-5):
    x     = np.copy(x0)
    r     = b - A * x
    p     = np.copy(r)
    rsold =,r)
    for it in range(niter):
        Ap    = A * p
        alpha = rsold /,Ap)
        x     += alpha * p
        r     -= alpha * Ap
        rsnew =,r)
        if (np.sqrt(rsnew) < toll):
        p = r + (rsnew / rsold) * p
        rsold = rsnew

def conjgrad_tensorflow(A,b,x0,niter=100,toll=1.e-5):
    x     = x0
    r     = b - tf.sparse.sparse_dense_matmul(A,x)
    p     = r
    rsold = tf.reduce_sum(tf.multiply(r, r))
    for it in range(niter):
        Ap    = tf.sparse.sparse_dense_matmul(A,p)
        alpha = rsold /tf.reduce_sum(tf.multiply(p, Ap))
        x     += alpha * p
        r     -= alpha * Ap
        rsnew = tf.reduce_sum(tf.multiply(r, r))
        if (tf.sqrt(rsnew) < toll):
        p = r + (rsnew / rsold) * p
        rsold = rsnew

As far as the sparse matrices, I first generated [I,J,V], such as A[I(k),J(k)]=V(k); then:

A_scipy = coo_matrix((V,(I,J)),shape=(N,N))
indices = np.hstack([I[:,np.newaxis], J[:,np.newaxis]])
A_tf    = tf.sparse.SparseTensor(indices=indices, values=V.astype(np.float32), dense_shape=[N,N])