# SVD data compression, using my son as an experiment :-)

The SVD stands for Singular value decomposition, its a matrix factorization method more details can be found on Wikipedia. Each matrix $A$ can be decomposed into 3 matrices.

$A = U \Sigma V^T$

Were $\Sigma$ is a diagonal matrix with the $r$ singular values. The decomposition will look like the follwoing figure:

Instead of using all $r$ singular values, select the top $k$, and instead of multiplying $U, \Sigma$ and $V^T$ to get matrix $A$, now multiply $U_k, \Sigma_k$ and $V^T_k$ to get an approximation $A_k$ of matrix $A$,

So this approximation can be created by using far less data than the original matrix. How good is the approximation? Well you could look at the Frobenius norm, or any other matrix norm that measures the distance between the approximation and the original matrix, see http://en.wikipedia.org/wiki/Matrix_norm.

Instead, to get things more visual you can look at pictures. I am using an old picture of me and my son in the following “SVD experiment”.

Original picture  2448 X 3264 pixels  ~ a matrix with 8 milion numbers

Now I am taking only the 15 largest singular values and reconstructing the picture with $U_{15}\Sigma_{15}V^T_{15}$. This uses only 1% of the data and I get the following picture

Now I am taking only the 75 largest singular values and reconstructing the picture with $U_{75}\Sigma_{75}V^T_{75}$. This uses around 5% of the data and I get.

In a next post I’ll show how this can be used in text mining….

Cheers, Longhow.