Sparse PCA: phase transitions and the near optimality of soft thresholding
יום ראשון 04.05 10:30 - 11:20
Abstract: Although PCA is a standard tool for low-rank matrix recovery, it has two key drawbacks in high dimensions: (1) the principal components are linear combinations of all the original variables, and (2) they are inconsistent estimators of their population counterparts. In this talk, we consider a method for producing sparse and more reliable principal components based on kernel PCA, which generalizes the covariance thresholding algorithm introduced by Krauthgamer, Nadler, and Vilenchik (2015). Our results establish a phase transition, analogous to the Baik-Ben Arous-Peche transition: above a certain threshold—depending on the kernel function, the aspect ratio of the data, and the signal's sparsity level—kernel PCA is informative. Conversely, below the threshold, kernel principal components are asymptotically orthogonal to the signal. We identify optimal kernel functions for both detection and support recovery, and numerical calculations suggest that soft thresholding is nearly optimal. This talk is based on joint work with Theodor Misiakiewicz and Elad Romanov.