Workshop session 1Friday, Dec. 12, Callaghan Room at the Whistler Westin |
|
| 7.30 - 8.00 | Stephen E. Fienberg: Algebraic statistics for random graph models: Markov bases and their uses |
| 8.05 - 8.35 | Adrian Dobra: Algebraic statistics and contingency tables |
| 8.40 - 9.10 | Keisuke Yamazaki: Toric Modification on Mixture Models |
| 9.15 - 9.25 | coffee break |
| 9.25 - 9.55 | Vincent Auvray: Learning Parameters in Discrete Naive Bayes Models by Computing Fibers of the Parametrization map |
| 10.00 - 10.30 | Paul von Bunau: Stationary Subspace Analysis |
Workshop session 2Friday, Dec. 12, Callaghan Room at the Whistler Westin |
|
| 15.30 - 16.00 | Doru Balcan: Alternatives to the Discrete Fourier Transform |
| 16.05 - 16.35 | Lek-Heng Lim: Graph Helmholtzian and rank learning |
| 16.40 - 17.10 | Xiaoye Jiang: Identity Management On Homogeneous spaces |
| 17.15-17.25 | coffee break |
| 17.25 - 17.55 | Carlos Guestrin: Exploiting Probabilistic Independence for Permutations |
| 17.55 - 18.30 | Tiberio Caetano: Consistent structured estimation for weighted bipartite matching |
Stephen E. Fienberg, Carnegie Mellon University
Sonja Petrovic, University of Illinois
Alessandro Rinaldo, Carnegie Mellon University
We use algebraic geometry to study a statistical model for the analysis of networks represented by graphs with directed edges due to Holland and Leinhardt, known as p1, which allows for differential attraction (popularity) and expansiveness, as well as an additional effect due to reciprocation. In particular, we attempt to derive Markov bases for p1 and to link these to the results on Markov bases for working with log-linear models for contingency tables. Because of the contingency table representation for p1 we expect some form of congruence. Markov bases and related algebraic geometry notions are useful for at least two statistical problems: (i) determining condition for the existence of maximum likelihood estimates, and (ii) using them to traverse conditional (given minimal sufficient statistics) sample spaces, and thus generating ``exact'' distributions useful for assessing goodness of fit. We outline some of these potential uses for the algebraic representation of p1.
Adrian Dobra, University of Washington
In this talk I will give an overview of the role of algebraic statistics in the statistical analysis of contingency tables. I will survey major areas in which algebraic methods proved to be crucial and provided a fertile ground for novel research directions: computation of sharp integer bounds for cell entries, existence of maximum likelihood estimates, simulation from probability distributions on spaces of tables, Markov bases, high-dimensional sparse tables with structural zeros, log-linear model selection. I will give examples that illustrate this methodology and talk about open problems.
Keisuke Yamazaki, Tokyo Institute of Technology
Sumio Watanabe, Tokyo Institute of Technology
In the Bayes estimation, it was pointed out that resolution of singularity provides an algorithm to elucidate the generalization performance of learning machines. However, there is no effective procedure to find the resolution map. This presentation proposes a new method to find it based on the toric modification, using Newton diagram. By the proposed method, learning curves of several hierarchical models are clarified.
Vincent Auvray, University of Lige
Louis Wehenkel, University of Lige
Discrete Naive Bayes models are usually defined parametrically with a map from a parameter space to a probability distribution space. First, we present two families of algorithms that compute the set of parameters mapped to a given discrete Naive Bayes distribution satisfying certain technical assumptions. Using these results, we then present two families of parameter learning algorithms that operate by projecting the distribution of observed relative frequencies in a dataset onto the discrete Naive Bayes model considered. They have nice convergence properties, but their computational complexity grows very quickly with the number of hidden classes of the model.
Paul von Bunau, TU Berlin
Frank C. Meinecke, TU Berlin
Klaus-Robert Muller, TU Berlin
Non-stationarities are an ubiquitous phenomenon in real-world data, yet they challenge standard Machine Learning methods: if training and test distributions differ we cannot, in principle, gen- eralise from the observed training sample to the test distribution. This affects both supervised and unsupervised learning algorithms. In a classification problem, for instance, we may infer spurious dependen- cies between data and label from the the training sample that are mere artefacts of the non-stationarities. Conversely, identifying the sources of non-stationary behaviour in order to better understand the analyzed system often lies at the heart of a scientific question. To this end, we propose a novel unsupervised paradigm: Stationary Subspace Analysis (SSA). SSA decomposes a multi-variate time-series into a stationary and a non-stationary subspace. We derive an efficient algorithm that hinges on an optimization procedure in the Special Orthogonal Group. By exploiting the Lie group structure of the optimization manifold, we can explicitly factor out the inherent symmetries of the problem and thereby reduce the number of parameters to the exact degrees of freedom. The practical utility of our approach is demonstrated in an application to Brain Computer-Interfacing (BCI).
Doru Balcan, Carnegie Mellon University
Aliaksei Sandryhaila, Carnegie Mellon University
Jonathan Gross, Carnegie Mellon University
Markus Puschel, Carnegie Mellon University
It is well-known that the discrete Fourier transform (DFT) of a finite length discrete-time signal samples the discrete-time Fourier transform of the same signal at equidistant points on the unit circle. Hence, as the signal length goes to infinity, the DFT approaches the DTFT. Associated with the DFT are circular convolution and a periodic signal extension. In this paper we identify a large class of alternatives to the DFT using the theory of polynomial algebras. Each of these Fourier transforms approaches the DTFT just as the DFT does, but has its own signal extension and notion of convolution, which therefore are not periodic. Furthermore, these Fourier transforms have Vandermonde structure, which enables their computation via fast $O(n \log^2(n))$ algorithms.
Lek-Heng Lim, University of California, Berkeley
The graph Helmholtzian is the graph theoretic analogue of the Helmholtz operator or vector Laplacian, in much the same way the graph Laplacian is the analogue of the Laplace operator or scalar Laplacian. We will see that a decomposition associated with the graph Helmholtzian provides a way to learn ranking information from incomplete, imbalanced, and cardinal score-based data. In this framework, an edge flow representing pairwise ranking is orthogonally resolved into a gradient flow (acyclic) that represents the L2-optimal global ranking and a divergence-free flow (cyclic) that quantifies the inconsistencies. If the latter is large, then the data does not admit a statistically meaningful global ranking. A further decomposition of the inconsistent component into a curl flow (locally cyclic) and a harmonic flow (locally acyclic) provides information on the validity of small- and large-scale comparisons of alternatives. This is joint work with Xiaoye Jiang, Yuan Yao, and Yinyu Ye.
Xiaoye Jiang, Stanford University
Leonidas J. Guibas, Stanford University
We consider the identity management problem, where the identities are classified into two classes, red and blue. The purpose here is to make predictions of the two class identities when confusions arise among identities. In this work, we propose a principle to maintain probability distributions over homogeneous space which provides a mechanism valid for taking into account of any desired degree of approximation. Markov models are used to formulate the two class identity management problem which tries to compactly summarize distributions on homogeneous spaces. Projecting down and lifting up information on different order of statistics can be achieved by using Radon transformations. The commutative property of Markov updating with Radon transform enable us to maintain exact information over different order of statistics. Thus, accurate classification predictions can be made based on the low order statistics we maintained. We evaluate the performance of our algorithms on a real camera network data and show effectiveness of our scheme.
Jonathan Huang, Carnegie Mellon University
Carlos Guestrin, Carnegie Mellon University
Xiaoye Jiang, Stanford University
Leonidas J. Guibas, Stanford University
Permutations are ubiquitous in many real world problems, such as voting, rankings and data association. Representing uncertainty over permutations is challenging, since there are n! possibilities. Recent Fourier-based approaches can be used to provide a compact representation over low-frequency components of the distribution. Though polynomial, the complexity of these representations grows very rapidly, especially if we want to maintain reasonable estimates for peaked distributions. In this talk, we first characterize the notion of probabilistic independence for distribution over permutations. We then present a method for factoring distributions into independent components in the Fourier domain and use our algorithms to decompose large problems into much smaller ones. Building on this method, we describe an algorithm that detects independence and adapts the choice of representation according to the complexity of the underlying problem. We demonstrate that our method provides very significant improvements in terms of running time, on real tracking data.
James Petterson, NICTA Canberra
Tiberio Caetano, NICTA Canberra
Julian McAuley, NICTA Canberra
Given a weighted bipartite graph, the assignment problem consists of finding the heaviest perfect match. This is a classical problem in combinatorial optimization, which is solvable exactly and efficiently by standard methods such as the Hungarian algorithm, and is widely applicable in real-world scenarios. We give an exponential family model for the assignment problem. Edge weights are obtained from a suitable composition of edge features and a parameter vector, which is learned so as to maximize the likelihood of a sample consisting of training graphs and their labeled matches. The resulting consistent estimator contrasts with existing max-margin structured estimators, which are inconsistent for this problem.