Principal Component Analysis is a popular Data Analysis dimensionality reduction technique, aiming to project with minimum error a given data set into a subspace of smaller dimension. In order to improve interpretability, different variants of the method have been proposed in the literature, in which, besides error minimization, sparsity is sought. In this talk, the problem of finding a subspace with a sparse basis is formulated as a Mixed Integer Nonlinear Program, where the sum of squares of distances between the points and their projections is minimized. Contrary to other attempts in the literature, with our model the user can fix the level of sparseness of the resulting basis vectors. Variable Neighborhood Search is proposed to solve the MINLP. Our numerical experience on test sets shows that our procedure outperforms benchmark methods in the literature. The strategy proposed attempts to minimize errors while keeping sparseness above a given threshold value. A problem of simultaneous optimization of sparseness and error minimization, parametrized by the total number of non-zero coordinates in the resulting principal components, is also studied. Numerical experiments show that this biobjective approach provides sparser components with less error than competing approaches. |