Michael Mahoney (UC Berkeley):
 Title: Column Subset Selection on Terabytesized Scientific Data
 Abstract: One of the most straightforward formulations of a feature selection problem boils down to the linear algebraic problem of selecting good columns from a data matrix. This formulation has the advantage of yielding features that are interpretable to scientists in the domain from which the data are drawn, an important consideration when machine learning methods are applied to realistic scientific data. While simple, this problem is central to many other seemingly nonlinear learning methods. Moreover, while unsupervised, this problem also has strong connections with related supervised learning methods such as Linear Discriminant Analysis and Canonical Correlation Analysis. We will describe recent work implementing Randomized Linear Algebra algorithms for this feature selection problem in parallel and distributed environments on inputs of size ranging from ones to tens of terabytes, as well as the application of these implementations to specific scientific problems in areas such as mass spectrometry imaging and climate modeling.
KlausRobert Müller (TU Berlin):
 Title: Explaining individual deep network predictions and measuring the quality of these explanations, joint work with Grégoire Montavon and Wojciech Samek .
 Abstract: Deep Neural Networks (DNNs) are powerful methods that excel in many fields including image annotation, natural language processing or speech recognition. While their results are impressive, DNNs are generally conceived to suffer from a lack of transparency preventing the human expert from being able to verify, interpret, and understand the reasoning of the system. We present a novel methodology for explaining individual predictions of generic multilayer neural networks. Our approach decomposes the classification decision into contributions of the input elements (e.g. pixels of an image). These decompositions can be visualized as "heatmaps" of same dimensions as the input data. We then introduce a quantitative measure for evaluating the produced explanations, and use it to compare our method to simple local sensitivity analysis and to a recently proposed deconvolution method for neural networks. Results are shown for three large image recognition data sets.
Fei Sha (University of Southern California):
 Title: Do shallow kernel methods match deep neural networks — and if not, what can the shallow ones learn from the deep ones?
 Abstract: Deep neural networks (DNNs) and other types of deep learning architectures have been hugely successful in a large number of applications. In contrast, kernel methods, which were exceedingly popular, have become lackluster. The crippling obstacle is the computational complexity of those methods. Nonetheless, there has been a recently resurgent interest in them. In particular, several research groups have studied how to scale kernel methods to cope with largescale learning problems. Despite those progresses, there has not been a systematic and headon comparison between kernel methods and DNNs. Specifically, while recent approaches have shown exciting promises, we are still left with at least one itching question unanswered: can kernel methods, after being scaled up for largescale datasets, truly match DNNs’ performance? In this talk, I will describe our efforts in (partially) answering that question. I will present extensive empirical studies of comparing kernel methods and DNNs for automatic speech recognition, a key field to which DNNs have been applied. Our investigative studies highlight the similarity and difference of those two paradigms. I will leave our main conclusion out as a teaser to this talk.
Le Song (Georgia Institute of Technology):
 Title: Scalable Kernel Methods for Big Nonlinear Problems
 Abstract: Nowadays, big data are collected routinely across a broad range of industrial and science sectors. While the complexity and scale of big data impose tremendous challenges for their analysis, big data also offer us great opportunities. Some nonlinear phenomena or relations which are not clear or can not be inferred reliably from small and medium data now become clear and can be learned robustly from big data. Typically, the form of the nonlinearity is unknown to us, and needs to be learned from data as well. Being able to harness the nonlinear structures and features from big data could allow us to tackle problems which are impossible before or obtain results which are far better than previous stateofthearts. In machine learning, there are two dominating paradigms for nonlinear modeling: kernel methods where nonlinear basis functions are fixed beforehand and neural networks where basis functions are adjusted using gradient descent. However, neither approach has provided satisfactory answers to the big nonlinear learning challenge; they either lack scalability or theoretical guarantees. I will talk about my recent efforts on scaling up kernel methods, comparing to deep neural networks and some new directions.
Kilian Weinberger (Cornell University):
 Title: Deep Manifold Traversal
 Abstract:

