Invited Speakers

Michael Mahoney (UC Berkeley):
  • TitleColumn Subset Selection on Terabyte-sized Scientific Data
  • AbstractOne of the most straightforward formulations of a feature selection problem boils down to the linear algebraic problem of selecting good columns from a data matrix.  This formulation has the advantage of yielding features that are interpretable to scientists in the domain from which the data are drawn, an important consideration when machine learning methods are applied to realistic scientific data.  While simple, this problem is central to many other seemingly nonlinear learning methods.  Moreover, while unsupervised, this problem also has strong connections with related supervised learning methods such as Linear Discriminant Analysis and Canonical Correlation Analysis.  We will describe recent work implementing Randomized Linear Algebra algorithms for this feature selection problem in parallel and distributed environments on inputs of size ranging from ones to tens of terabytes, as well as the application of these implementations to specific scientific problems in areas such as mass spectrometry imaging and climate modeling.

Klaus-Robert Müller (TU Berlin):

  • TitleExplaining individual deep network predictions and measuring the quality of these explanations, joint work with Grégoire Montavon and Wojciech Samek .
  • AbstractDeep Neural Networks (DNNs) are powerful methods that excel in many fields including image annotation, natural language processing or speech recognition.  While their results are impressive, DNNs are generally conceived to suffer from a lack of transparency preventing the human expert from being able to verify, interpret, and understand the reasoning of the system. We present a novel methodology for explaining individual predictions of generic multilayer neural networks. Our approach decomposes the classification decision into contributions of the input elements (e.g. pixels of an image). These decompositions can be visualized as "heatmaps" of same dimensions as the input data. We then introduce a quantitative measure for evaluating the produced explanations, and use it to compare our method to simple local sensitivity analysis and to a recently proposed deconvolution method for neural networks. Results are shown for three large image recognition data sets.
Fei Sha (University of Southern California):
  • TitleDo shallow kernel methods match deep neural networks — and if not, what can the shallow ones learn from the deep ones?
  • AbstractDeep  neural networks (DNNs) and other types of deep learning architectures have been hugely successful in a large number of applications. In contrast, kernel methods, which were exceedingly popular, have become lackluster.  The crippling obstacle is the computational complexity of those methods.  Nonetheless,  there has been a recently resurgent interest in them. In particular, several research groups have studied how to scale kernel methods to cope with large-scale learning problems. Despite those progresses, there has not been a systematic and  head-on comparison between kernel methods and DNNs. Specifically, while recent approaches have shown exciting promises, we are still left with at least one itching question unanswered: can kernel methods, after being scaled up for large-scale datasets,  truly match DNNs’ performance? In this talk, I will describe our efforts in (partially) answering that question. I will present extensive empirical studies of comparing kernel methods and DNNs for automatic speech recognition, a key field to which DNNs have been applied. Our investigative studies  highlight the similarity and difference of those two paradigms. I will leave our main conclusion out as  a teaser to this talk.

Le Song (Georgia Institute of Technology):

  • TitleScalable Kernel Methods for Big Nonlinear Problems
  • AbstractNowadays, big data are collected routinely across a broad range of industrial and science sectors. While the complexity and scale of big data impose tremendous challenges for their analysis, big data also offer us great opportunities. Some nonlinear phenomena or relations which are not clear or can not be inferred reliably from small and medium data now become clear and can be learned robustly from big data. Typically, the form of the nonlinearity is unknown to us, and needs to be learned from data as well. Being able to harness the nonlinear structures and features from big data could allow us to tackle problems which are impossible before or obtain results which are far better than previous state-of-the-arts.  In machine learning, there are two dominating paradigms for nonlinear modeling: kernel methods where nonlinear basis functions are fixed beforehand and neural networks where basis functions are adjusted using gradient descent. However, neither approach has provided satisfactory answers to the big nonlinear learning challenge; they either lack scalability or theoretical guarantees. I will talk about my recent efforts on scaling up kernel methods, comparing to deep neural networks and some new directions.
Kilian Weinberger (Cornell University):
  • Title: Deep Manifold Traversal
  • Abstract: