CoMeT: Computation, Metrics and Topology for data analysis 

Associated Team -- 2011 - 2013


Logo Stanford
Geometric Computing group
Stanford University
Principal Investigator: Leonidas Guibas


    Logo INRIA
  
Geometrica group
INRIA Saclay -- Île de France
Principal Investigator: Steve Oudot


Logo Ohio-State University

Computational Geometry group
Ohio-State University
Principal Investigator: Yusu Wang

 

Project overview:

CoMeT is an associated team between the Geometrica group at INRIA, the Geometric Computing group at Stanford University, and the Computational Geometry group at the Ohio State University. Its focus is on the design of computational methods for the analysis of high-dimensional data, using tools from metric geometry and algebraic topology. Our goal is to extract enough structure from the data, so we can get a higher-level informative understanding of these data and of the spaces they originate from.
The main challenge is to be able to go beyond mere dimensionality reduction and topology inference, without the need for a costly explicit reconstruction. To validate our approach, we intend to set our methods against real-life data sets coming from a variety of applications, including (but not restricted to) clustering, image or shape segmentation, sensor field monitoring, shape classification and matching. The three research groups involved in this project have been active contributors in the field of Computational Topology in the recent years, and some of their members have had long-standing collaborations. We believe this associated team can help create new synergies between these groups.

Detailed research program:
see here.



Scientifc outcomes

2013

Various projects started in the previous years have turned into papers to be submitted or published shortly: In addition, the following new projects have been started:
We tackle the problem of analyzing the topological structure of the graph of a scalar function f defined over a Riemannian manifold X, in the context where the data points and function values undergo unbounded noise. The interplay between geometric and functional noise makes this case significantly more difficult than the bounded noise case. Combining ideas from statistical regression and topological data analysis, we derive an algorithm that can recover an approximation of f over a denoised point cloud, from which an approximation of the persistence diagram of f can be computed.

Status: in preparation


We develop a novel formulation for the notion of shape differences, aimed at providing detailed information about the location and na- ture of the differences or distortions between the two shapes being compared. Our difference operator, derived from a shape map, is much more informative than just a scalar global shape similarity score, rendering it useful in a variety of applications where more refined shape comparisons are necessary. The approach is intrin- sic and is based on a linear algebraic framework, allowing the use of many common linear algebra tools (e.g, SVD, PCA) for study- ing a matrix representation of the operator. Remarkably, the for- mulation allows us not only to localize shape differences on the shapes involved, but also to compare shape differences across pairs of shapes, and to analyze the variability in entire shape collections based on the differences between the shapes. Moreover, while we use a map or correspondence to define each shape difference, consistent correspondences between the shapes are not necessary for comparing shape differences, although they can be exploited if available.

Bibtex:

@article{roabcg-mbeisdv-13
, author = "Raif M. Rustamov and Maks Ovsjanikov and Omri Azencot and Mirela Ben-Chen and Frédéric Chazal and Leonidas Guibas"
, title = "Map-Based Exploration of Intrinsic Shape Differences and Variability"
, journal = "Transactions on Graphics (proc. SIGGRAPH)"
, volume = "32"
, number = "4"
, pages = "72:1--72:12"
, year = "2013"
}


2012

This year saw the start of several new projects, between INRIA and Stanford on the one hand, between INRIA and OSU on the other hand. Each one of them is expected to succeed within a year or two.
In this work we address several challenges related to analyzing and visualizing maps between shapes and their collections. Unlike the majority of prior work which tries to discover maps in the context of shape matching, our main focus is on evaluating, analyzing and visualizing a given map in an efficient and intuitive way. We are motivated primarily by the following considerations: 1) there is currently no way to represent a map in a multi-scale way, 2) existing metrics for map evaluation are quadratic and expensive to compute in practice, and 3) there are no simple methods for map visualization that would highlight areas where the map fails to meet desired quality criteria, rather then concentrating on areas where the map is good. We propose to tackle these problems in a unified way by considering the functional representation of a map, and performing spectral analysis on this representation.

Status: submitted


The classical approach to topological data analysis using distances to compact sets has proved its usefulness in cases where the input point cloud data has no or small noise. However, it is known to fail in the presence of outliers. The bottomline of this work is to replace compact sets by measures, and distances to compact sets by distances to measures. These are known to be much more robust to noise of large amplitude, however computing their persistence diagram directly is prohibitively costly, even in small dimensions, which explains the lack of practical applications in the context of homology inference so far. We propose novel and faithful approximations to the distance to a measure, whose persistence diagrams can be computed in practice at least in small dimensions.

Status: in preparation


The Vietoris-Rips filtration proved an efficient tool for doing topological data analysis in high dimensions. Its main asset is that the only geometric predicate involved in its construction are the comparison of distances, which can be performed easily in all dimensions. However, the construction time of the full Rips filtration incurs an exponential blow-up with the size of the input data, which makes it impossible in practice to compute its full persistence diagram. Last year, a new line of work was initiated by Donald Sheehy (now a post-doc in the Geometrica group), who showed how to construct an O(n)-size filtered simplicial complex (called the sparse Rips filtration) on an $n$-point metric space, such that its persistence diagram is a good approximation to that of the Vietoris-Rips filtration. This new filtration can be constructed in $O(n\log n)$ time. The constant factors in both the size and the running time depend only on the doubling dimension of the metric space and the desired tightness of the approximation. However, like the standard Rips construction, Sheehy's filtration is not resilient to large noise and outliers. The goal of this work is to take advantage of our advances on the approximation of the distance to a measure, to devise variants of the sparse Rips filtration that allows us to do homology inference in the presence of noise and outliers.

Status: in preparation


In addition, the journal version of our paper on metric graphs reconstruction from point cloud data was published in IJCGA.

2011

In this work we present a novel clustering algorithm that combines a mode-seeking phase with a cluster merging phase. While mode detection is performed by a standard graph-based hill-climbing scheme, the novelty of our approach resides in its use of topological persistence theory to guide the merges between clusters. An interesting feature of our algorithm is to provide additional feedback in the form of a finite set of points in the plane, called a {\em persistence diagram}, which provably reflects the prominence of each of the modes of the density. Such feedback is an invaluable tool in practice, as it enables the user to determine a set of parameter values that will make the algorithm compute a relevant clustering on the next run. In terms of generality, our approach requires the sole knowledge of (approximate) pairwise distances between the data points, as well as of rough estimates of the density at these points. It is therefore virtually applicable in any arbitrary metric space. In the meantime, its complexity remains reasonable: although the size of the input distance matrix may be up to quadratic in the number of data points, a careful implementation only uses a linear amount of main memory and barely takes more time to run than the one spent reading the input. Taking advantage of recent advances in topological persistence theory, we are able to give a theoretically sound notion of what the "correct" number k of clusters is, and to prove that under mild sampling conditions and a relevant choice of parameters (made possible in practice by the persistence diagram) our clustering scheme computes a set of k clusters whose spatial locations are bound to the ones of the basins of attraction of the peaks of the density. These guarantess hold in a large variety of contexts, including when data points are distributed along some unknown Riemannian manifold.

Bibtex:

@inproceedings{cgos-pbc-11
, author = "F. Chazal and L. J. Guibas and S. Y. Oudot and P. Skraba"
, title = "Persistence-Based Clustering in Riemannian Manifolds"
, booktitle = "Proc. 27th Annual ACM Symposium on Computational Geometry"
, pages = "97--106"
, month = "June"
, year = 2011
}

Many real-world data sets can be viewed of as noisy samples of special types of metric spaces called metric graphs [16]. Building on the notions of correspondence and Gromov- Hausdorff distance in metric geometry, this work describes a model for such data sets as an approximation of an underlying met- ric graph. We present a novel algorithm that takes as an input such a data set, and outputs the underlying metric graph with guarantees. We also implement the algorithm, and evaluate its performance on a variety of real world data sets.

Bibtex:

@inproceedings{accggm-mgrnd-11
, author = "M. Aanjaneya and F. Chazal and D. Chen and M. Glisse and L. J. Guibas and D. Morozov"
, title = "Metric Graph Reconstruction from Noisy Data"
, booktitle = "Proc. 27th Annual ACM Symposium on Computational Geometry"
, pages = "37--46"
, month = "June"
, year = 2011
}

Motivated by the increasing availability of large collections of noisy GPS traces, this work presents a new data-driven frame- work for smoothing trajectory data. The framework, which can be viewed of as a generalization of the classical moving average technique, naturally leads to efficient algorithms for various smoothing objectives. We analyze an algorithm based on this framework and provide connections to pre- vious smoothing techniques. We implement a variation of the algorithm to smooth an entire collection of trajectories and show that it performs well on both synthetic data and massive collections of GPS traces.

Bibtex:

@inproceedings{ccgjs-dts-11
, author = "F. Chazal and D. Chen and L. J. Guibas and X. Jiang and C. Sommer"
, title = "Data-Driven Trajectory Smoothing"
, booktitle = "Proc. 19th ACM SIGSPATIAL International Conference and Advances in Geographic Information Systems"
, month = "November"
, year = 2011
}


Interactions

2013

2012


2011




Contact: steve.oudot[at]inria.fr                     
Last update: 2012-10-30.