Amr Ahmed, Google
Large amounts of data arise in a multitude of situations, ranging from bioinformatics to astronomy, manufacturing, and medical applications. For concreteness the lectures focus on data obtained in the context of the internet, such as user generated content (microblogs, emails, messages), behavioral data (locations, interactions, clicks, queries), and graphs. Due to its magnitude, much of the challenges are to extract structure and interpretable models without the need for additional labels, i.e. to design effective unsupervised techniques. We present design patterns for latent variable models such as hierarchical nonparametric Bayesian models and topic models, efficient inference algorithms that scale to the size of the data on the internet, and modeling tools to describe salient aspects of the data. We will illustrate these techniques using real-world applications in news clustering, user profiling and computational advertising.
Background:
Basic familiarity with statistical concepts and graphical models at the level of an entry graduate machine learning course.
Mikhail Belkin, Ohio State
Kernel Methods, Unlabeled data and Manifold Learning
Kernel methods, such as Support Vector Machines have become one the most important tools in machine learning.
I will introduce the basic ideas of kernel methods and discuss their rich mathematical and algorithmic foundations.
We will also discuss recent work on scaling kernel methods to very large datasets.
In the second part of the lectures we will talk about the problem of learning in high dimension, in particular as it relates to learning from unlabeled data and manifold learning. We will discuss manifold methods, clustering and how unlabeled data can be used in the framework of kernel methods.
Stefanie Jegelka, Berkeley
Submodularity and practical discrete optimization in machine learning
Optimization is increasingly recognized as a core component of machine learning. Solving complex learning and inference tasks at scale inevitably relies on practical optimization techniques. In these lectures, I will focus in particular on challenging discrete problems that arise in machine learning. While those problems may be very hard in general, oftentimes the problems of interest possess additional, exploitable structure.
As an example of a fairly universal structure, I will show how many machine learning problems may be formulated in terms of submodular functions. This includes learning with structured sparse models, image labeling problems, sensing, information retrieval, influence maximization, clustering and many more. Also viewed as a "discrete analogue of convexity", submodular functions have arisen in a number of areas, and this connection can enable algorithms for exact optimization, or results within approximation bounds. In addition, it leads to interesting questions regarding modeling, as well as online and distributed settings.
I will outline example problems, relevant basics of the theory of submodular functions including their relations to convexity and properties of related polyhedra, complexity results, and practical algorithms and their properties. These topics will touch upon modeling examples, as well as combinatorial and convex optimization techniques. Finally, I will introduce ideas for submodular optimization for machine learning in parallel and distributed settings.
The lectures will cover parts of the following references. More specific references and further reading will be given in the lectures.
Applications of submodularity in machine learning:
tutorial slides from ICML (by Stefanie Jegelka and Andreas Krause): http://www.cs.berkeley.edu/~stefje/submodularity_icml.html
tutorial slides from NIPS (by Jeff Bilmes): https://nips.cc/Conferences/2013/Program/event.php?ID=3688
F. Bach. Learning with Submodular Functions: A Convex Optimization Perspective. Foundations and Trends in Machine Learning, Vol. 6, Issue 2-3. 2013. (Chapters 2,3,5,6,8, Appendix A1)
http://www.di.ens.fr/~fbach/2200000039-Bach-Vol6-MAL-039.pdf
A. Krause, D. Golovin. Submodular Function Maximization, Chapter in Tractability: Practical Approaches to Hard Problems, Cambridge University Press, 2014. (Section 1, 2, parts of 3)
http://las.ethz.ch/files/krause12survey.pdf
M. Jaggi. Revisiting Frank-Wolfe: Projection-free sparse convex optimization. ICML, 2013.
http://m8j.net/math/revisited-FW.pdf
S. Jegelka, F. Bach, S. Sra. Reflection methods for user-friendly submodular optimization. NIPS, 2013.
http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2013_4988.pdf
For submodularity and Markov Random Fields in Computer Vision: Chapters 1 and 2 in
A. Blake, P. Kohli, C. Rother. Markov Random Fields for Vision and Image Processing. MIT Press.
For more background on convex duality, see e.g. the book by Boyd & Vandenberghe.
Ankur Moitra, MIT
I will survey some recent work on efficient algorithms for representation learning, in various settings. In the first part of the tutorial, I will describe resent work on the nonnegative matrix factorization problem, which has important applications throughout machine learning (and theory). As is often the case, this problem is NP-hard when considered in full generality. However, we introduce a sub-case called separable nonnegative matrix factorization that we believe is the right notion in various contexts. We give a polynomial time algorithm for this problem, and leverage this algorithm to efficiently learn the topics in a Latent Dirichlet Allocation model (and beyond). This is an auspicious example where theory can lead to inherently new algorithms that have highly-practical performance on real data sets.
In the second part of the tutorial, I will describe work in progress on the dictionary learning problem. Sparse representations play an essential role in many fields including statistics, signal processing and machine learning. But can we efficiently learn a basis that enables a sparse representation, if one exists? This problem has many names, and is often called dictionary learning or sparse coding. We give the first algorithms for learning an unknown, incoherent and overcomplete dictionary in a natural stochastic model, and our bounds allow a sparsity that is within a logarithmic factor of the information theoretic threshold for sparse recovery on incoherent dictionaries. We will also give new insights into analyzing the performance of alternating minimization.
Time permitting, I will also describe applications of the generalized eigenvalue problem in learning and statistics, and describe a new smoothed analysis model for studying how well these approaches work for recovering overcomplete models.
References:
Computing a nonnegative matrix factorization, provably
Sanjeev Arora, Rong Ge, Ravi Kanan and Ankur Moitra
http://arxiv.org/abs/1111.0952
Learning topic models -- going beyond svd
Sanjeev Arora, Rong Ge and Ankur Moitra
http://arxiv.org/abs/1204.1956
A practical algorithm for topic modeling with provable guarantees
Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag,
Yichen Wu and Michael Zhu
http://arxiv.org/abs/1212.4777
New algorithms for learning incoherent and overcomplete dictionaries
Sanjeev Arora, Rong Ge and Ankur Moitra
http://arxiv.org/abs/1308.6273
Smoothed analysis of tensor decompositions
Aditya Bhaskara, Moses Charikar, Ankur Moitra and Aravindan Vijayaraghavan
http://arxiv.org/abs/1311.3651
Suresh Venkatasubramanian: Introduction, Learning at Scale
Amr Ahmed: Big Data and Large Scale Inference
Misha Belkin: Statistical Machine Learning and Kernel Methods
Stefanie Jegelka: Efficiently handling discrete structure in machine learning
Ankur Moitra: NEW ALGORITHMS FOR NONNEGATIVE MATRIX FACTORIZATION AND BEYOND
Amr Ahmed: Big Data and Large Scale Inference: II
Misha Belkin: Statistical Machine Learning and Kernel Methods II
Stefanie Jegelka: Efficiently handling discrete structure in machine learning
Ankur Moitra: TENSOR DECOMPOSITIONS AND THEIR APPLICATIONS
Stefanie Jegelka: Efficiently handling discrete structure in machine learning
Ankur Moitra: Algorithms for Dictionary Learning
Mikhail Belkin: Machine Learning and the Geometry of Data
Amr Ahmed: Big Data and Large Scale Inference: III (PowerPoint) Big Data and Large Scale Inference: III (PDF)