SVD data compression, using my son as an experiment :-)

The SVD stands for Singular value decomposition, its a matrix factorization method more details can be found on Wikipedia. Each matrix A can be decomposed into 3 matrices.

A = U \Sigma V^T

Were \Sigma is a diagonal matrix with the r singular values. The decomposition will look like the follwoing figure:

SVDM2

Instead of using all r singular values, select the top k, and instead of multiplying U, \Sigma and V^T to get matrix A, now multiply U_k, \Sigma_k and V^T_k to get an approximation A_k of matrix A,

SVM3

So this approximation can be created by using far less data than the original matrix. How good is the approximation? Well you could look at the Frobenius norm, or any other matrix norm that measures the distance between the approximation and the original matrix, see http://en.wikipedia.org/wiki/Matrix_norm.

Instead, to get things more visual you can look at pictures. I am using an old picture of me and my son in the following “SVD experiment”.

Original picture  2448 X 3264 pixels  ~ a matrix with 8 milion numbers

SVD2

Now I am taking only the 15 largest singular values and reconstructing the picture with U_{15}\Sigma_{15}V^T_{15}. This uses only 1% of the data and I get the following picture

SVD1

Now I am taking only the 75 largest singular values and reconstructing the picture with U_{75}\Sigma_{75}V^T_{75}. This uses around 5% of the data and I get.

Original

In a next post I’ll show how this can be used in text mining….

Cheers, Longhow.

2 thoughts on “SVD data compression, using my son as an experiment :-)

  1. Pingback: Text mining basics: The IMDB reviews | Longhow Lam's Blog

  2. Pingback: A little H2O deeplearning experiment on the MNIST data set | Longhow Lam's Blog

Leave a comment