Semi-supervised sparse Gaussian classification: provable benefits of unlabeled data

יום שלישי 23.12 10:30 - 11:30

Abstract: The premise of semi-supervised learning (SSL) is that combining labeled and unlabeled data yields significantly more accurate models. Despite empirical successes, the theoretical understanding of SSL is still far from complete. In this work, we study SSL for high-dimensional sparse Gaussian classification. To construct an accurate classifier, a key task is feature selection, detecting the few variables that separate the two classes. Our key contribution is the identification of a regime in the problem parameters where SSL is guaranteed to be advantageous for classification. Specifically, there is a regime where it is possible to construct an accurate SSL classifier in polynomial time. However, any computationally efficient supervised or unsupervised learning schemes that separately use only the labeled or unlabeled data would fail. This work highlights the provable benefits of combining labeled and unlabeled data for classification and feature selection in high dimensions.

Speaker

Eyar Azar

Weizmann institute