Abstract:
Natural language processing (NLP) is a machine learning field that operates over text in its various forms, written in one of the many existing human languages. Textual human writings can be found in a large number of different domains, including news articles, medical records, economic reports, and scientific papers, to name a few.
Modern NLP models applied to such texts require a significant amount of labeled data to obtain satisfactory results on downstream tasks. Nonetheless, scarcity is common, especially for languages with fewer native speakers or domains outside the mainstream. In such low-resource cases, it is essential to construct algorithms that can use external resources, such as unlabeled data from the same or from a different domain, external databases (e.g. Wikipedia), and domain expertise of human experts.
In this dissertation, we aim to investigate and develop innovative structured algorithms for NLP tasks under low-resource conditions. As part of our experiments, we engage in challenging setups present in low-resource NLP. These include unsupervised domain adaptation for cross-domain and low-resource languages, model compression under domain shift, and active learning that operates over pairs of tasks with unique task relations. With such challenging setups, we demonstrate a substantial increase in performance by improving common low-resource techniques, such as self-training, structured model compression, and confidence-based active learning.
https://technion.zoom.us/j/94950420992