Direction des Relations et Internationales (DRI)

Programme INRIA "Equipes Associées"

 

I. DEFINITION

EQUIPE ASSOCIEE

TGDA
sélection
2008

Equipe-Projet INRIA : Géometrica Organisme étranger partenaire : Stanford University
Centre de recherche INRIA : Futurs / Saclay
Thème INRIA : SYM (systèmes symboliques)
Pays : USA
 
 
Coordinateur français
Coordinateur étranger
Nom, prénom OUDOT, Steve Y. GUIBAS, Leonidas J.
Grade/statut CR / Chargé de Recherches Full Professor
Organisme d'appartenance
(précisez le département et/ou le laboratoire)
INRIA Futurs Stanford University, Computer Science Department
Adresse postale Parc Orsay Université
ZAC des Vignes
4, rue Jacques Monod - Bât. P
91893 ORSAY Cedex, France
Computer Science Department
Gates Building
Stanford University
Stanford, CA 94305, USA
URL http://geometrica.saclay.inria.fr/collaborations/TGDA/ http://geometry.stanford.edu/member/guibas/
Téléphone (+33) 174 854 216 (+1) 650 723 0304
Télécopie (+33) 174 854 249 (+1) 650 723 0033
Courriel steve.oudot[at]inria.fr guibas[at]cs.stanford.edu


La proposition en bref / Overview of the Proposal

Collaboration title: Topological and Geometric Data Analysis / Analyse topologique et géométrique de données

Description (about 10 lines) :

TGDA is an associate team between the Géometrica group at INRIA and the Geometric Computation group at Stanford University, focused on the topological and geometric analysis of possibly high-dimensional point cloud data that are difficult to handle with standard linear or non-linear dimensionality reduction techniques. Such data appear in as diverse areas as machine learning, dynamical systems, structural biology, image processing, or sensor networks. They are typically obtained by sampling from highly-curved manifolds with non-trivial topology, embedded in high-dimensional Euclidean spaces. Our main goal will be to devise suitable methods to handle this type of data, based on recent advances in computational geometry and algebraic topology, including (but not restricted to) persistent homology, witness complexes, and distance functions. Both the Géometrica group at INRIA and the Geometric Computation group at Stanford University have actively contributed to these advances in the recent years, and we think their collaboration is now mature enough to take advantage of the associate team framework. To validate our approach, we intend to set our methods against real-life data sets coming from other scientific fields, thereby developing new collaborations with other groups at INRIA and at Stanford University.

 


II. BILAN 2009


 

Rapport scientifique de l'année 2009

The research program for 2009 is accessible here

Most of the work performed in 2008 and 2009 on the theoretical aspects of topological data analysis was turned into several papers that appeared in top conferences this year:

All these papers were written and co-signed by researchers from both Geometrica and Stanford University. In particular, the last paper in the list is the outcome of a collaboration between two PhD students, one from Geometrica (Quentin Mérigot) and the other from Stanford University (Maksim Ovsjanikov).

As planned a year ago, 2009 was mainly devoted to the study of the potential applications of the above theoretical advances:

All these results were presented at the workshop organized by TGDA in Paris in July 2009, which gathered about 40 scientists from the field of computational topology as well as from other areas of potential application like machine learning, computer graphics, and wireless sensor networks.

Rapport financier 2009

Avant de remplir les tableaux, consultez les règles au paragraphe "Financement" de la page d'accueil du programme.

1. Dépenses EA (effectuées sur les crédits de l'Equipe Associée)
Montant dépensé
Invitations des partenaires 3725
Missions INRIA12000
Logistique du workshop TGDA3000
Total
18725

Remarks:

2. Dépenses externes (effectuées sur des financements hors EA)
Montant dépensé
Région PACA:
Invitations des partenaires 0
Missions INRIA vers le partenaire 1800
Total
1800
SIAM:
Invitations des partenaires 0
Missions INRIA vers le partenaire 500
Total
500
France-Stanford Fund:
Invitations des partenaires 6500
Missions INRIA vers le partenaire 0
Total
6500
NSF:
Invitations des partenaires 0
Missions INRIA vers le partenaire 8000
Total
8000

Total des financements externes dépensés

16800

Total des financements EA et externes dépensés

35525

 

Bilan des échanges effectués en 2009


1. Chercheurs Seniors

Nom
statut (1)
provenance
destination
objet (2)
durée (3)
Coût (si financement EA)
Coût (si financement externe)
Steve Oudot CR Geometrica Stanford visit 1 month 0 (travel expenses covered by the 2008 budget of TGDA) 3000 (local expenses)
Leonidas Guibas Full Professor Stanford Paris TGDA workshop 4 days 0 2500 (travel + local expenses)
Frédéric Chazal DR Geometrica Stanford visit 2 weeks 1000 (travel expenses) 0 (local expenses covered by the 2010 budget)
Jean-Daniel Boissonnat DR Geometrica SODA'10 + Stanford conference + visit 2 weeks 4 days 2500 (travel expenses + local expenses at the conference + registration fee) 0 (local expenses for the visit covered by the 2010 budget)
Steve Oudot CR Geometrica SODA'10 conference 4 days 2500 (travel + local expenses + registration fee) 0
Frédéric Chazal DR Geometrica SODA'10 conference 4 days 2500 (travel + local expenses + registration fee) 0
Leonidas Guibas Full Professor Stanford SODA'10 conference 4 days 0 2000 (travel + local expenses + registration fee)

Total des durées
2 months 20 days
(1) DR / CR / professeur
(2) colloque, thèse, stage, visite....
(3)
précisez l'unité (mois, semaine..)


2. Juniors

Nom
statut (1)
provenance
destination
objet (2)
durée (3)
Coût (si financement EA)
Coût (si financement externe)
Quentin Mérigot PhD student Geometrica Stanford visit 1.5 months 1750 (travel + part of the local expenses) 900 (bourse région PACA)
Dmitriy Morozov post-doc Stanford Paris TGDA workshop + visit 20 days 2000 (travel + local expenses) 0
Facundo Mémoli post-doc Stanford Paris TGDA workshop 4 days 300 (local expenses) 1000 (travel expenses)
Mikael Vejdemo-Johansson post-doc Stanford Paris TGDA workshop 10 days 750 (local expenses) 1000 (travel expenses)
Maksim Ovsjanikov PhD student Stanford Paris TGDA workshop 4 days 375 (local expenses) 1000 (travel expenses)
Peter Huang PhD student Stanford Paris TGDA workshop 4 days 300 (local expenses) 1000 (travel expenses)
Pooran Memari PhD student Geometrica Stanford visit 1.5 months 750 (travel expenses) 900 bourse region PACA + 3000 NSF grant (local expenses)
Quentin Mérigot PhD student Geometrica SPM'09 conference 4 days 1000 (travel expenses) 500 SIAM student grant (registration fee + local expenses)

Total des durées
4 months 15 days
(1) post-doc / doctorant / stagiaire
(2) colloque, thèse, stage, visite....
(3) précisez l'unité (mois, semaine..)

III. PREVISIONS 2010

Programme de travail

Since 2009, the Geometrica group has been part of a joint effort with the Geometric Computation and Applied Topology groups at Stanford University to develop the successor of the PLEX library, now considered a standard in computational topology. The new library, named CULT ( Computational Unified Library for Topology), will offer a vast set of tools that reflect the state of the art in the field. It should be able to handle very large and high-dimensional data sets within reasonable amount of time and space, while keeping enough flexibility for people to be able to build their own algorithms upon it. Potential interactions with other libraries like CGAL will also be investigated. We are setting the development of the first release of the library among our primary goals for the upcoming years, and especially for 2010.

Algorithmic aspects

Making the library effective will require a lot of work on designing new data structures whose complexities scale up reasonably with the size and dimensionality of the input data. To give an idea of the challenge, each time the dimension of the Rips complex is increased by one, its size is multiplied roughly by ten. As a result, people commonly have to deal with complexes containing dozens of millions of simplices. Such complexes do not fit into main memory, at least with a naive data representation. On the other hand, brute-force compression of the data results in prohibitively slow access times. A trade-off between space and time has to be found, which will be one of the objectives of Marc Glisse, a new chargé de recherches joining Geometrica and TGDA this Fall.

Another way of dealing with massive data sets is to design out-of-core algorithms. Our clustering scheme is one example: even though the input is in the form of a distance matrix, the algorithm only has to scan the matrix once and column-wise, loading only one column at a time in main memory. As a result, the algorithm can deal with data sets containing millions of points within a few minutes, with a memory consumption that is only linear in the size of the point cloud. This property motivated our choice of making this code the very first one to be inserted in the CULT library, so as to set a standard for future code developments. Can other algorithms in the computational topology literature be made out-of-core?

Another question related to data structures is how fast they can be updated under point insertions and deletions. Constructions in topological data analysis mostly rely on comparisons between distances, simply because more elaborate geometric predicates are difficult (if at all possible) to implement in high dimensions. Thus, nearest neighbor queries and reverse nearest neighbor queries are by far the most common during updates. Alhough the former type of queries has been well studied in the past and can be performed reasonably fast, there is hardly any literature on the latter type of queries. We are currently collaborating with two Ph.D. students in the Theory group at Stanford University to devise a data structure based on Locality-Sensitive Hashing (LSH) that answers (approximate) reverse nearest neighbor queries in sublinear time, with a complexity that scales up polynomially with the ambient dimension. This data structure uses LSH as a black-box and reduces the approximate reverse nearest neighbor problem to the Point Location Among Equal Balls problem, in the same spirit as in the original LSH paper but with a completely different construction. A paper on the subject is currently being written and should be submitted in 2010. Then, the dynamic version of the problem, where new data points may be inserted or old ones deleted, will be considered.

Targeted applications

Many potential applications of our work might be considered, and our hope is that the CULT library eventually becomes a standard tool for developing such applications. In a near future we intend to focus primarily on the following ones:

Mathematical aspects

In the context of topological inference from point cloud data, people have been primarily interested in the early parts of the filtrations they built, as the rest of these ever-growing families of complexes quickly became too big to be stored. The bottomline was that inferring the homology of the space underlying the input point cloud only required to know the beginning of the barcode associated with the filtration. In fact, the very notion of the {\em space underlying the data} is subject to interpretation and depends on the scale at which the data is inspected. Take for instance a point cloud sampled from a helicoidal curve that is drawn on the Clifford torus in R^4. This data set admits at least three candidate underlying spaces: at a small scale, the curve; at a larger scale, the torus; and at an even larger scale, the 3-sphere of radius sqrt(2) on which the Clifford torus is naturally embedded. One might also add the point cloud itself and R^4 at either ends of the spectrum. It would be very interesting to see if these various possible answers appear in the barcodes of our filtrations, with transition effects between scales. For this we need to be able to compute the full filtrations, or at least to approximate their full barcodes. Two projects in this direction have been started:

These two approaches are complementary and showed encouraging preliminary results. They will be investigated further in 2010.

 

Programme d'échanges avec budget prévisionnel

1. Echanges

Our goal for 2010 will be to keep up the rate of exchanges to the same level as in 2008-2009, especially at junior level :

In addition to the above exchanges, let us mention that the post-doctoral internship of Primoz Skraba in the Geometrica group will be continued in 2010, funded by the ANR grant GIGA (started this Fall). Conversely, Quentin Mérigot will start a post-doctoral internship in the Geometric Computation group at Stanford University in January.

 1. ESTIMATION DES DÉPENSES EN MISSIONS INRIA VERS LE PARTENAIRE
Nombre de personnes
Coût estimé
Chercheurs confirmés 3 6000 (travel expenses covered by the 2009 budget)
Post-doctorants
   
Doctorants 1 4000 (travel expenses covered by the 2009 budget)

Stagiaires

   
Autre (précisez) :
   
   Total
4 10000

 

 2. ESTIMATION DES DÉPENSES EN INVITATIONS DES PARTENAIRES
Nombre de personnes
Coût estimé
Chercheurs confirmés 1 5000 (travel + local expenses)
Post-doctorants
   
Doctorants 2 12000 (travel + intern salary)

Stagiaires

   
Autre (précisez) :
   
   Total
3 17000

2. Cofinancement

Cette coopération bénéficie-t-elle déjà d'un soutien financier de la part de l'INRIA, de l'organisme étranger partenaire ou d'un organisme tiers (projet européen, NSF, ...) ?

The total additional funding amounts to 17000 euros.

3. Demande budgétaire

Indiquez, dans le tableau ci-dessous, le coût global estimé de la proposition et le budget demandé à la DRI dans le cadre de cette Equipe Associée.
(maximum 20 K€ pour une prolongation en 2e année et 10 K€ pour une 3e année).

Commentaires
Montant
A. Coût global de la proposition (total des tableaux 1 et 2 : invitations, missions, ...) 27000
B. Cofinancements utilisés (financements autres que Equipe Associée) 17000

Financement "Équipe Associée" demandé (A.-B.)
(maximum 20 K€ pour une 2e année et 10 K€ pour une 3e année)

10000

Remarques ou observations :

 

 

 

© INRIA - mise à jour le 14/10/2009