Special talk by Arsen Vasilyan on Foundations of Reliable Learning with Imperfect Data
Info about event
Time
Location
Ada-333
Abstract:
A central challenge in machine learning is reliability: ensuring that an algorithm’s predictions remain accurate and stable even when the data is noisy, adversarial, or misspecified. This talk presents recent progress on learning algorithms whose robustness is grounded in provable guarantees.
The first part of the talk introduces a framework for assumption-aware learning, that is algorithms that explicitly recognize when their modeling assumptions fail. Rather than producing an incorrect prediction, these learners may respond with “I don’t know” when the conditions required for correctness are violated. We formalize this paradigm and develop a versatile toolkit for making learning algorithms assumption-aware.
The second part of the talk focuses on supervised learning in the strong contamination model, where an adversary may arbitrarily corrupt a small fraction of the training examples, including both labels and features. While robust learning is well understood in unsupervised settings, general-purpose efficient algorithms for supervised learning under such adversarial noise have remained elusive. Building on our toolkit of techniques for assumption-awareness, we develop the first general-purpose efficient algorithm of this kind, applicable to a broad family of hypothesis classes.
As a highlight, we show that constant-depth Boolean circuits can be learned efficiently under strong contamination, with running time matching the classical algorithm of Linial, Mansour, and Nisan. This result resolves a 30-year-old open problem and demonstrates that reliability under adversarial corruption is compatible with optimal computational efficiency.