# CoMeT: Computation, Metrics and Topology for data analysis

Associated Team -- 2011 - 2013

 Geometric Computing group Stanford University Principal Investigator: Leonidas Guibas Geometrica group INRIA Saclay -- Île de France Principal Investigator: Steve Oudot Computational Geometry group Ohio-State University Principal Investigator: Yusu Wang

 Project overview: CoMeT is an associated team between the Geometrica group at INRIA, the Geometric Computing group at Stanford University, and the Computational Geometry group at the Ohio State University. Its focus is on the design of computational methods for the analysis of high-dimensional data, using tools from metric geometry and algebraic topology. Our goal is to extract enough structure from the data, so we can get a higher-level informative understanding of these data and of the spaces they originate from. The main challenge is to be able to go beyond mere dimensionality reduction and topology inference, without the need for a costly explicit reconstruction. To validate our approach, we intend to set our methods against real-life data sets coming from a variety of applications, including (but not restricted to) clustering, image or shape segmentation, sensor field monitoring, shape classification and matching. The three research groups involved in this project have been active contributors in the field of Computational Topology in the recent years, and some of their members have had long-standing collaborations. We believe this associated team can help create new synergies between these groups. Detailed research program: see here.

## Scientifc outcomes

### 2013

Various projects started in the previous years have turned into papers to be submitted or published shortly:
• Last year's projects on homology inference in the presence of noise and outliers and on sparse Rips filtrations for distances to measures have been gathered into a single paper, to be submitted to the ACM Symposium on Computational Geometry at the end of the year.
• The full version of our paper on clustering using persistance has been accepted to the Journal of the ACM. It will appear in the next issue of the journal.
In addition, the following new projects have been started:
• Scalar Fields Analysis in the Presence of Unbounded Noise and Outliers (joint work involving INRIA and OSU):
We tackle the problem of analyzing the topological structure of the graph of a scalar function f defined over a Riemannian manifold X, in the context where the data points and function values undergo unbounded noise. The interplay between geometric and functional noise makes this case significantly more difficult than the bounded noise case. Combining ideas from statistical regression and topological data analysis, we derive an algorithm that can recover an approximation of f over a denoised point cloud, from which an approximation of the persistence diagram of f can be computed.

Status: in preparation

• Map-Based Exploration of Intrinsic Shape Differences and Variability (joint work involving INRIA and Stanford):
We develop a novel formulation for the notion of shape differences, aimed at providing detailed information about the location and na- ture of the differences or distortions between the two shapes being compared. Our difference operator, derived from a shape map, is much more informative than just a scalar global shape similarity score, rendering it useful in a variety of applications where more refined shape comparisons are necessary. The approach is intrin- sic and is based on a linear algebraic framework, allowing the use of many common linear algebra tools (e.g, SVD, PCA) for study- ing a matrix representation of the operator. Remarkably, the for- mulation allows us not only to localize shape differences on the shapes involved, but also to compare shape differences across pairs of shapes, and to analyze the variability in entire shape collections based on the differences between the shapes. Moreover, while we use a map or correspondence to define each shape difference, consistent correspondences between the shapes are not necessary for comparing shape differences, although they can be exploited if available.

Bibtex:

@article{roabcg-mbeisdv-13
, author = "Raif M. Rustamov and Maks Ovsjanikov and Omri Azencot and Mirela Ben-Chen and Frédéric Chazal and Leonidas Guibas"
, title = "Map-Based Exploration of Intrinsic Shape Differences and Variability"
, journal = "Transactions on Graphics (proc. SIGGRAPH)"
, volume = "32"
, number = "4"
, pages = "72:1--72:12"
, year = "2013"
}

### 2012

This year saw the start of several new projects, between INRIA and Stanford on the one hand, between INRIA and OSU on the other hand. Each one of them is expected to succeed within a year or two.
• Analysis and Visualization of Maps Between Shapes (joint work involving INRIA and Stanford):
In this work we address several challenges related to analyzing and visualizing maps between shapes and their collections. Unlike the majority of prior work which tries to discover maps in the context of shape matching, our main focus is on evaluating, analyzing and visualizing a given map in an efficient and intuitive way. We are motivated primarily by the following considerations: 1) there is currently no way to represent a map in a multi-scale way, 2) existing metrics for map evaluation are quadratic and expensive to compute in practice, and 3) there are no simple methods for map visualization that would highlight areas where the map fails to meet desired quality criteria, rather then concentrating on areas where the map is good. We propose to tackle these problems in a unified way by considering the functional representation of a map, and performing spectral analysis on this representation.

Status: submitted

• Homology Inference in the Presence of Noise and Outliers (joint work involving INRIA and OSU):
The classical approach to topological data analysis using distances to compact sets has proved its usefulness in cases where the input point cloud data has no or small noise. However, it is known to fail in the presence of outliers. The bottomline of this work is to replace compact sets by measures, and distances to compact sets by distances to measures. These are known to be much more robust to noise of large amplitude, however computing their persistence diagram directly is prohibitively costly, even in small dimensions, which explains the lack of practical applications in the context of homology inference so far. We propose novel and faithful approximations to the distance to a measure, whose persistence diagrams can be computed in practice at least in small dimensions.

Status: in preparation

• Sparse Rips Filtrations for Distances to Measures (joint work involving INRIA and OSU):
The Vietoris-Rips filtration proved an efficient tool for doing topological data analysis in high dimensions. Its main asset is that the only geometric predicate involved in its construction are the comparison of distances, which can be performed easily in all dimensions. However, the construction time of the full Rips filtration incurs an exponential blow-up with the size of the input data, which makes it impossible in practice to compute its full persistence diagram. Last year, a new line of work was initiated by Donald Sheehy (now a post-doc in the Geometrica group), who showed how to construct an O(n)-size filtered simplicial complex (called the sparse Rips filtration) on an $n$-point metric space, such that its persistence diagram is a good approximation to that of the Vietoris-Rips filtration. This new filtration can be constructed in $O(n\log n)$ time. The constant factors in both the size and the running time depend only on the doubling dimension of the metric space and the desired tightness of the approximation. However, like the standard Rips construction, Sheehy's filtration is not resilient to large noise and outliers. The goal of this work is to take advantage of our advances on the approximation of the distance to a measure, to devise variants of the sparse Rips filtration that allows us to do homology inference in the presence of noise and outliers.

Status: in preparation

In addition, the journal version of our paper on metric graphs reconstruction from point cloud data was published in IJCGA.

Bibtex:

@article{accggm-mgrnd-12, author = "M. Aanjaneya and F. Chazal and D. Chen and M. Glisse and L. J. Guibas and D. Morozov", title = "Metric Graph Reconstruction from Noisy Data", journal = "International Journal of Computational Geometry and Applications (IJCGA)", volume = "22", number = "4", pages = "305--325", year = "2012"}

### 2011

• Clustering point cloud data using topological persistence (joint work involving INRIA and Stanford):
In this work we present a novel clustering algorithm that combines a mode-seeking phase with a cluster merging phase. While mode detection is performed by a standard graph-based hill-climbing scheme, the novelty of our approach resides in its use of topological persistence theory to guide the merges between clusters. An interesting feature of our algorithm is to provide additional feedback in the form of a finite set of points in the plane, called a {\em persistence diagram}, which provably reflects the prominence of each of the modes of the density. Such feedback is an invaluable tool in practice, as it enables the user to determine a set of parameter values that will make the algorithm compute a relevant clustering on the next run. In terms of generality, our approach requires the sole knowledge of (approximate) pairwise distances between the data points, as well as of rough estimates of the density at these points. It is therefore virtually applicable in any arbitrary metric space. In the meantime, its complexity remains reasonable: although the size of the input distance matrix may be up to quadratic in the number of data points, a careful implementation only uses a linear amount of main memory and barely takes more time to run than the one spent reading the input. Taking advantage of recent advances in topological persistence theory, we are able to give a theoretically sound notion of what the "correct" number k of clusters is, and to prove that under mild sampling conditions and a relevant choice of parameters (made possible in practice by the persistence diagram) our clustering scheme computes a set of k clusters whose spatial locations are bound to the ones of the basins of attraction of the peaks of the density. These guarantess hold in a large variety of contexts, including when data points are distributed along some unknown Riemannian manifold.

Bibtex:

@inproceedings{cgos-pbc-11, author = "F. Chazal and L. J. Guibas and S. Y. Oudot and P. Skraba", title = "Persistence-Based Clustering in Riemannian Manifolds", booktitle = "Proc. 27th Annual ACM Symposium on Computational Geometry", pages = "97--106", month = "June", year =    2011}

• Reconstructing metric graphs from point cloud data (joint work involving INRIA and Stanford):
Many real-world data sets can be viewed of as noisy samples of special types of metric spaces called metric graphs [16]. Building on the notions of correspondence and Gromov- Hausdorff distance in metric geometry, this work describes a model for such data sets as an approximation of an underlying met- ric graph. We present a novel algorithm that takes as an input such a data set, and outputs the underlying metric graph with guarantees. We also implement the algorithm, and evaluate its performance on a variety of real world data sets.

Bibtex:

@inproceedings{accggm-mgrnd-11, author = "M. Aanjaneya and F. Chazal and D. Chen and M. Glisse and L. J. Guibas and D. Morozov", title = "Metric Graph Reconstruction from Noisy Data", booktitle = "Proc. 27th Annual ACM Symposium on Computational Geometry", pages = "37--46", month = "June", year =    2011}

• Smoothing GPS trajectories using distances to measures (joint work involving INRIA and Stanford):

Motivated by the increasing availability of large collections of noisy GPS traces, this work presents a new data-driven frame- work for smoothing trajectory data. The framework, which can be viewed of as a generalization of the classical moving average technique, naturally leads to efficient algorithms for various smoothing objectives. We analyze an algorithm based on this framework and provide connections to pre- vious smoothing techniques. We implement a variation of the algorithm to smooth an entire collection of trajectories and show that it performs well on both synthetic data and massive collections of GPS traces.

Bibtex:

@inproceedings{ccgjs-dts-11, author = "F. Chazal and D. Chen and L. J. Guibas and X. Jiang and C. Sommer", title = "Data-Driven Trajectory Smoothing", booktitle = "Proc. 19th ACM SIGSPATIAL International Conference and Advances in Geographic Information Systems", month = "November", year =    2011}

## Interactions

### 2013

• January: Fengtao Fan from OSU visited the Geometrica group for 1 month, to work on scalar fields analysis in the presence of noise and outliers..
• February: Don Sheehy from Inria visited the group of Leo Guibas at Stanford for 2 weeks, to discuss about sparse filtrations and their application to data anlysis.
• June: Yusu Wang visited Geometrica for a week, to discuss about ongoing research on comparing Reeb graphs.
• July: Peter Huang, from the group of Leo Guibas at Stanford, visited Geometrica for a week, to present his research in geometry processing, and to discuss potential interactions with the research performed within CoMeT.
• July-August: Maks Ovsjanikov visited the group of Leo Guibas at Stanford for 1 month, to discuss about on-going common projects pertaining to the exploration of spaces of shapes.
• August: Clément Maria from Geometrica spent 2 weeks at Stanford, to discuss about his research on lightweight and efficient algorithms for persistance calculation.
• October-November: Kyle Heath, from the group of Leo Guibas at Stanford, is spending 2 months in the Geometrica group at Inria, to work on the exploration of spaces of images and shapes.

### 2012

• April-June: Mickaël Buchet visited the group of Yusu Wang at OSU for 2 months, to work on homology inference in the presence of noise and outliers.
• August: Maks Ovsjanikov visited the group of Leo Guibas at Stanford for a month, to work on the analysis of maps between spaces.
• September: Yusu Wang and Misha Belkin visited the Geometrica group at INRIA for a week, to discuss about homology inference in the presence of noise and outliers.
• October: Clément Maria visited the group of Yusu Wang at OSU for a week, and then visited Pr. Coutsias at University of New Mexico for a week, to discuss about new topics related to efficient data structures for simplicial complexes representation.
• October-November: Don Sheehy is visiting OSU for a week, to discuss about sparse Rips filtrations for distances to measures.

### 2011

• July 23rd-31st: Steve Oudot visited Leo Guibas at Stanford and Google. The scientific discussion evolved around future directions for our joint work on persistence-based clustering.
• November 14th-16th: Steve Oudot, Marc Glisse and Donald Sheehy will visit Yusu Wang and her group at Ohio-State University. The purpose of this visit will be to touch base and start common scientific projects on topics related to graph Laplacians and their applications. Since Yusu's group is a leader in this area, a weekly seminar on graph Laplacians is being held throughout the Fall in the geometrica group to prepare for the visit.
• November 7th-11th: On his way to Ohio-State University, Steve will make a stop at a workshop in Toronto, where Fred Chazal, Leo Guibas and Yusu Wang will also be present. This will be a good opportunity for them to start the discussion that will continue on the next week at Ohio-State.

Contact: steve.oudot[at]inria.fr
Last update: 2012-10-30.