SVD gradient only implemented for almost square matrices

Hey I was taking a look at the implementation of SVD’s gradient. For the full matrices case, it throws an error if the input matrix dimensions are greater than one apart. I don’t see this restriction noted in the paper deriving the gradient

Was wondering if someone could shed some light, thanks!

Ok so I ran this by a friend and want to check this explanation is correct.

Two or more additional columns than rows (or rows than columns) guarantees at least two zero valued singular values. Zero valued singular values are equal which hits the known issue where we require distinct singular values for dU and dV. Due to numerical precision, we won’t have exactly zero singular values so we can calculate dU and dV but we will be dividing in the terms of F by close to zero numbers so the result would be numerically unstable.

Hopefully this is right, if not lmk pls :slight_smile: