Conjugate Gradient using tf sparse tensors has worse performances than the same algorithm ScyPy sparse

Hi,
I implemented a conjugate gradient that works with TensorFlow sparse objects. The algorithm produces the same results as the algorithm implemented with SciPy sparse objects, however, it is 5 times slower. I wonder if someone can help me to understand why and suggest a way to fix this.
The algorithms for SciPy and for TensorFlow are the following:

def conjgrad_scipy(A,b,x0,niter=100,toll=1.e-5):
    x     = np.copy(x0)
    r     = b - A * x
    p     = np.copy(r)
    rsold = np.dot(r,r)
    
    for it in range(niter):
        Ap    = A * p
        alpha = rsold /np.dot(p,Ap)
        x     += alpha * p
        r     -= alpha * Ap
        rsnew = np.dot(r,r)
        if (np.sqrt(rsnew) < toll):
            break
        p = r + (rsnew / rsold) * p
        rsold = rsnew
    return([x,it,np.sqrt(rsnew)])


def conjgrad_tensorflow(A,b,x0,niter=100,toll=1.e-5):
    x     = x0
    r     = b - tf.sparse.sparse_dense_matmul(A,x)
    p     = r
    rsold = tf.reduce_sum(tf.multiply(r, r))
    for it in range(niter):
        Ap    = tf.sparse.sparse_dense_matmul(A,p)
        alpha = rsold /tf.reduce_sum(tf.multiply(p, Ap))
        x     += alpha * p
        r     -= alpha * Ap
        rsnew = tf.reduce_sum(tf.multiply(r, r))
        if (tf.sqrt(rsnew) < toll):
            break
        p = r + (rsnew / rsold) * p
        rsold = rsnew
    return([x,it,tf.sqrt(rsnew)])

As far as the sparse matrices, I first generated [I,J,V], such as A[I(k),J(k)]=V(k); then:

A_scipy = coo_matrix((V,(I,J)),shape=(N,N))
indices = np.hstack([I[:,np.newaxis], J[:,np.newaxis]])
A_tf    = tf.sparse.SparseTensor(indices=indices, values=V.astype(np.float32), dense_shape=[N,N])

Thanks.

Cesare