SVD gradient only implemented for almost square matrices

William_Berman · August 25, 2022, 3:40am

Hey I was taking a look at the implementation of SVD’s gradient. For the full matrices case, it throws an error if the input matrix dimensions are greater than one apart. I don’t see this restriction noted in the paper deriving the gradient https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf

github.com

tensorflow/tensorflow/blob/de38c5a7328a251993770c2714474b32193253c0/tensorflow/python/ops/linalg_grad.py#L870

      
        
            use_adjoint = False
            if m > n:
              # Compute the gradient for A^H = V * S^T * U^H, and (implicitly) take the
              # Hermitian transpose of the gradient at the end.
              use_adjoint = True
              m, n = n, m
              u, v = v, u
              grad_u, grad_v = grad_v, grad_u
            
            
with ops.control_dependencies([grad_s, grad_u, grad_v]):
              if full_matrices and abs(m - n) > 1:
                raise NotImplementedError(
                    "svd gradient is not implemented for abs(m - n) > 1 "
                    f"when full_matrices is True. Received: m={m} and n={n} from "
                    f"op input={a} with shape={a_shape}.")
              s_mat = array_ops.matrix_diag(s)
              s2 = math_ops.square(s)
            
            
  # NOTICE: Because of the term involving f, the gradient becomes
              # infinite (or NaN in practice) when singular values are not unique.
              # Mathematically this should not be surprising, since for (k-fold)

Was wondering if someone could shed some light, thanks!

William_Berman · August 25, 2022, 6:26am

Ok so I ran this by a friend and want to check this explanation is correct.

Two or more additional columns than rows (or rows than columns) guarantees at least two zero valued singular values. Zero valued singular values are equal which hits the known issue where we require distinct singular values for dU and dV. Due to numerical precision, we won’t have exactly zero singular values so we can calculate dU and dV but we will be dividing in the terms of F by close to zero numbers so the result would be numerically unstable.

Hopefully this is right, if not lmk pls