CoMeT:
Computation,
Metrics
and Topology for data analysis
Associated Team -- 2011 - 2013
Geometric Computing group
Stanford University
Principal Investigator: Leonidas Guibas
|
Geometrica group
INRIA Saclay -- Île de France
Principal Investigator: Steve Oudot
|
Computational Geometry group
Ohio-State University
Principal Investigator: Yusu Wang
|
Project overview:
CoMeT is an associated team between the Geometrica group at INRIA, the
Geometric Computing group at Stanford University, and the Computational
Geometry group at the Ohio State University. Its focus is on the design
of computational methods for the analysis of high-dimensional data,
using tools from metric geometry and algebraic topology. Our goal is to
extract enough structure from the data, so we can get a higher-level
informative understanding of these data and of the spaces they originate from. The main challenge
is to be able to go beyond mere dimensionality reduction and topology
inference, without the need for a costly explicit reconstruction. To validate our
approach, we intend to set our methods against real-life data sets
coming from a variety of applications, including (but not restricted
to) clustering, image or shape segmentation, sensor field
monitoring, shape classification and matching. The three research
groups involved in this project have been active contributors in the
field of Computational Topology in the recent years, and some of their
members have had long-standing collaborations. We believe this
associated team can help create new synergies between these
groups.
|
Detailed research program: see here.
|
Scientifc outcomes
2013
Various projects started in the previous years have turned into papers
to be submitted or published shortly:
- Last year's projects on homology inference in the
presence of noise and outliers and on sparse Rips filtrations
for distances to measures have been gathered into a single
paper, to be submitted to the ACM Symposium on Computational Geometry
at the end of the year.
-
The full version of our paper on clustering using
persistance has been accepted to the Journal of the ACM. It will
appear in the next issue of the journal.
In addition, the following new projects have been started:
- Scalar Fields Analysis in the Presence of Unbounded Noise and
Outliers (joint work
involving INRIA and OSU):
We tackle the problem of analyzing the topological structure of the
graph of a scalar function f defined over a Riemannian manifold X, in the
context where the data points and function values undergo unbounded
noise. The interplay between geometric and functional noise makes
this case significantly more difficult than the bounded noise
case. Combining ideas from statistical regression and topological data
analysis, we derive an algorithm that can recover an approximation of
f over a denoised point cloud, from which an approximation of the
persistence diagram of f can be computed.
Status: in preparation
- Map-Based Exploration of Intrinsic Shape Differences and
Variability (joint work
involving INRIA and Stanford):
We develop a novel formulation for the notion of shape differences,
aimed at providing detailed information about the location and na-
ture of the differences or distortions between the two shapes being
compared. Our difference operator, derived from a shape map, is
much more informative than just a scalar global shape similarity
score, rendering it useful in a variety of applications where more
refined shape comparisons are necessary. The approach is intrin-
sic and is based on a linear algebraic framework, allowing the use
of many common linear algebra tools (e.g, SVD, PCA) for study-
ing a matrix representation of the operator. Remarkably, the for-
mulation allows us not only to localize shape differences on the
shapes involved, but also to compare shape differences across pairs
of shapes, and to analyze the variability in entire shape collections
based on the differences between the shapes. Moreover, while
we use a map or correspondence to define each shape difference,
consistent correspondences between the shapes are not necessary
for comparing shape differences, although they can be exploited if
available.
Bibtex:
@article{roabcg-mbeisdv-13
, author = "Raif M. Rustamov and
Maks Ovsjanikov and
Omri Azencot and
Mirela Ben-Chen and
Frédéric Chazal and
Leonidas Guibas"
, title = "Map-Based Exploration of Intrinsic Shape Differences and
Variability"
, journal = "Transactions on Graphics (proc. SIGGRAPH)"
, volume = "32"
, number = "4"
, pages = "72:1--72:12"
, year = "2013"
}
2012
This year saw the start of several new projects, between
INRIA and Stanford on the one hand, between INRIA and OSU on the other
hand. Each one of them is expected to succeed within a year or two.
- Analysis and Visualization of Maps Between Shapes (joint work
involving INRIA and Stanford):
In this work we address several challenges related to analyzing and
visualizing maps between shapes and their collections. Unlike the
majority of prior work which tries to discover maps in the context of
shape matching, our main focus is on evaluating, analyzing and
visualizing a given map in an efficient and intuitive way. We
are motivated primarily by the following considerations: 1) there is
currently no way to represent a map in a multi-scale way, 2) existing
metrics for map evaluation are quadratic and expensive to compute in
practice, and 3) there are no simple methods for map visualization
that would highlight areas where the map fails to meet desired quality
criteria, rather then concentrating on areas where the map is good. We
propose to tackle these problems in a unified way by considering the
functional representation of a map, and performing spectral analysis
on this representation.
Status: submitted
- Homology Inference in the Presence of Noise and Outliers (joint
work involving INRIA and OSU):
The classical approach to
topological data analysis using distances to compact sets has
proved its usefulness in cases where the input point cloud data
has no or small noise. However, it is known to fail in the
presence of outliers. The bottomline of this work is to replace
compact sets by measures, and distances to compact sets by
distances to measures. These are known to be much more robust to
noise of large amplitude, however computing their persistence
diagram directly is prohibitively costly, even in small
dimensions, which explains the lack of practical applications in
the context of homology inference so far. We propose novel and
faithful approximations to the distance to a measure, whose
persistence diagrams can be computed in practice at least in
small dimensions.
Status: in preparation
- Sparse Rips Filtrations for Distances to Measures (joint
work involving INRIA and OSU):
The Vietoris-Rips filtration
proved an efficient tool for doing topological data analysis in
high dimensions. Its main asset is that the only geometric
predicate involved in its construction are the comparison of
distances, which can be performed easily in all
dimensions. However, the construction time of the full Rips
filtration incurs an exponential blow-up with the size of the
input data, which makes it impossible in practice to compute its
full persistence diagram. Last year, a new line of work was initiated by
Donald Sheehy (now a post-doc in the Geometrica group), who showed how to
construct an O(n)-size filtered simplicial complex (called the
sparse Rips filtration) on an $n$-point metric space,
such that its persistence diagram is a good approximation to that
of the Vietoris-Rips filtration. This new filtration can be
constructed in $O(n\log n)$ time. The constant factors in both the
size and the running time depend only on the doubling dimension of
the metric space and the desired tightness of the
approximation. However, like the standard Rips construction,
Sheehy's filtration is not resilient to large noise and
outliers. The goal of this work is to take advantage of our
advances on the approximation of the distance to a measure, to
devise variants of the sparse Rips filtration that allows us to do
homology inference in the presence of noise and outliers.
Status: in preparation
In addition, the journal version of our paper on metric graphs
reconstruction from point cloud data was published in IJCGA.
@article{accggm-mgrnd-12
, author = "M. Aanjaneya and F. Chazal and D. Chen and M. Glisse and L. J. Guibas and D. Morozov"
, title = "Metric Graph Reconstruction from Noisy Data"
, journal = "International Journal of Computational Geometry and Applications (IJCGA)"
, volume = "22"
, number = "4"
, pages = "305--325"
, year = "2012"
}
2011
- Clustering point cloud data using topological persistence (joint
work involving INRIA and Stanford):
In this work we present a novel
clustering algorithm that combines a mode-seeking phase with a cluster
merging phase. While mode detection is performed by a standard
graph-based hill-climbing scheme, the novelty of our approach resides
in its use of topological persistence theory to guide the merges
between clusters. An interesting feature of our algorithm is to provide
additional feedback in the form of a finite set of points in the plane,
called a {\em persistence diagram}, which provably reflects the
prominence of each of the modes of the density. Such feedback is an
invaluable tool in practice, as it enables the user to determine a set
of parameter values that will make the algorithm compute a relevant
clustering on the next run. In terms of generality, our approach
requires the sole knowledge of (approximate) pairwise distances between
the data points, as well as of rough estimates of the density at these
points. It is therefore virtually applicable in any arbitrary metric
space. In the meantime, its complexity remains reasonable: although the
size of the input distance matrix may be up to quadratic in the number
of data points, a careful implementation only uses a linear amount of
main memory and barely takes more time to run than the one spent
reading the input. Taking advantage of recent advances in topological
persistence theory, we are able to give a theoretically sound notion of
what the "correct" number k of clusters is, and to prove that under
mild sampling conditions and a relevant choice of parameters (made
possible in practice by the persistence diagram) our clustering scheme
computes a set of k clusters whose spatial locations are bound to the
ones of the basins of attraction of the peaks of the density. These
guarantess hold in a large variety of contexts, including when data
points are distributed along some unknown Riemannian manifold.
Bibtex:
@inproceedings{cgos-pbc-11
, author = "F. Chazal and L. J. Guibas and S. Y. Oudot and P. Skraba"
, title = "Persistence-Based Clustering in Riemannian Manifolds"
, booktitle = "Proc. 27th Annual ACM Symposium on Computational Geometry"
, pages = "97--106"
, month = "June"
, year = 2011
}
- Reconstructing
metric
graphs
from point cloud data (joint work involving INRIA and
Stanford):
Many real-world data sets can be
viewed of as noisy samples of special types of metric spaces called
metric graphs [16]. Building on the notions of correspondence and
Gromov- Hausdorff distance in metric geometry, this work describes a
model for such data sets as an approximation of an underlying met- ric
graph. We present a novel algorithm that takes as an input such a data
set, and outputs the underlying metric graph with guarantees. We also
implement the algorithm, and evaluate its performance on a variety of
real world data sets.
Bibtex:
@inproceedings{accggm-mgrnd-11
, author = "M. Aanjaneya and F. Chazal and D. Chen and M. Glisse and L. J. Guibas and D. Morozov"
, title = "Metric Graph Reconstruction from Noisy Data"
, booktitle = "Proc. 27th Annual ACM Symposium on Computational Geometry"
, pages = "37--46"
, month = "June"
, year = 2011
}
- Smoothing
GPS
trajectories
using distances to measures (joint work
involving INRIA and Stanford):
Motivated by the increasing
availability of large collections of noisy GPS traces, this work
presents a new data-driven frame- work for smoothing trajectory data.
The framework, which can be viewed of as a generalization of the
classical moving average technique, naturally leads to efficient
algorithms for various smoothing objectives. We analyze an algorithm
based on this framework and provide connections to pre- vious smoothing
techniques. We implement a variation of the algorithm to smooth an
entire collection of trajectories and show that it performs well on
both synthetic data and massive collections of GPS traces.
Bibtex:
@inproceedings{ccgjs-dts-11
, author = "F. Chazal and D. Chen and L. J. Guibas and X. Jiang and C. Sommer"
, title = "Data-Driven Trajectory Smoothing"
, booktitle = "Proc. 19th ACM SIGSPATIAL International Conference and Advances in Geographic Information Systems"
, month = "November"
, year = 2011
}
Interactions
2013
- January: Fengtao Fan from OSU visited the Geometrica group
for 1 month, to work on scalar fields analysis in the presence of
noise and outliers..
- February: Don Sheehy from Inria visited the group of Leo
Guibas at Stanford for 2 weeks, to discuss about sparse filtrations
and their application to data anlysis.
- June: Yusu Wang visited Geometrica for a week, to discuss
about ongoing research on comparing Reeb graphs.
- July: Peter Huang, from the group of Leo Guibas at Stanford, visited Geometrica for a
week, to present his research in geometry processing, and to discuss
potential interactions with the research performed within
CoMeT.
- July-August: Maks Ovsjanikov visited the group of Leo Guibas at
Stanford for 1 month, to discuss about on-going common projects
pertaining to the exploration of spaces of shapes.
- August: Clément Maria from Geometrica spent 2 weeks
at Stanford, to discuss about his research on lightweight and
efficient algorithms for persistance calculation.
- October-November: Kyle Heath, from the group of Leo Guibas
at Stanford, is spending 2 months in the Geometrica group at Inria,
to work on the exploration of spaces of images and shapes.
2012
- April-June: Mickaël Buchet visited the group
of Yusu Wang at OSU for 2 months, to work on homology inference in
the presence of noise and outliers.
- August: Maks Ovsjanikov visited the group of Leo
Guibas at Stanford for a month, to work on the analysis of maps
between spaces.
- September: Yusu Wang and Misha Belkin visited the
Geometrica group at INRIA for a week, to discuss about homology
inference in the presence of noise and outliers.
- October: Clément Maria visited the group of Yusu
Wang at OSU for a week, and then visited Pr. Coutsias at University
of New Mexico for a week, to discuss about new topics related to
efficient data structures for simplicial complexes representation.
- October-November: Don Sheehy is visiting OSU for a week,
to discuss about sparse Rips filtrations for distances to measures.
2011
- July 23rd-31st: Steve
Oudot visited Leo Guibas at Stanford and
Google. The scientific discussion evolved around future directions for
our joint work on persistence-based clustering.
- November 14th-16th: Steve
Oudot, Marc Glisse and Donald Sheehy
will visit Yusu Wang and her group at Ohio-State University. The
purpose of this visit will be to touch base and start common scientific
projects on topics related to graph Laplacians and their applications.
Since Yusu's group is a leader in this area, a weekly seminar on
graph Laplacians is being held throughout the Fall in the geometrica
group to prepare for the visit.
- November 7th-11th: On his
way to Ohio-State University, Steve will make a stop at a workshop in
Toronto, where Fred Chazal, Leo Guibas and Yusu Wang will also be
present. This will be a good opportunity for them to start the
discussion that will continue on the next week at Ohio-State.
Contact: steve.oudot[at]inria.fr
Last update: 2012-10-30.