Direction des Relations et Internationales (DRI)
EQUIPE ASSOCIEE |
TGDA |
sélection |
2008 |
Equipe-Projet INRIA : Géometrica | Organisme étranger partenaire : Stanford University |
Centre de recherche INRIA : Futurs / Saclay Thème INRIA : SYM (systèmes symboliques) |
Pays : USA |
Coordinateur
français |
Coordinateur
étranger |
|
Nom, prénom | OUDOT, Steve Y. | GUIBAS, Leonidas J. |
Grade/statut | CR / Chargé de Recherches | Full Professor |
Organisme d'appartenance (précisez le département et/ou le laboratoire) |
INRIA Futurs | Stanford University, Computer Science Department |
Adresse postale | Parc Orsay Université ZAC des Vignes 4, rue Jacques Monod - Bât. P 91893 ORSAY Cedex, France |
Computer Science Department Gates Building Stanford University Stanford, CA 94305, USA |
URL | http://geometrica.saclay.inria.fr/collaborations/TGDA/ | http://geometry.stanford.edu/member/guibas/ |
Téléphone | (+33) 174 854 216 | (+1) 650 723 0304 |
Télécopie | (+33) 174 854 249 | (+1) 650 723 0033 |
Courriel | steve.oudot[at]inria.fr | guibas[at]cs.stanford.edu |
Collaboration title: Topological and Geometric Data Analysis / Analyse topologique et géométrique de données |
Description (about
10 lines) : TGDA is an associate team between the Géometrica group at INRIA and the Geometric Computation group at Stanford University, focused on the topological and geometric analysis of possibly high-dimensional point cloud data that are difficult to handle with standard linear or non-linear dimensionality reduction techniques. Such data appear in as diverse areas as machine learning, dynamical systems, structural biology, image processing, or sensor networks. They are typically obtained by sampling from highly-curved manifolds with non-trivial topology, embedded in high-dimensional Euclidean spaces. Our main goal will be to devise suitable methods to handle this type of data, based on recent advances in computational geometry and algebraic topology, including (but not restricted to) persistent homology, witness complexes, and distance functions. Both the Géometrica group at INRIA and the Geometric Computation group at Stanford University have actively contributed to these advances in the recent years, and we think their collaboration is now mature enough to take advantage of the associate team framework. To validate our approach, we intend to set our methods against real-life data sets coming from other scientific fields, thereby developing new collaborations with other groups at INRIA and at Stanford University. |
|
Most of the work performed in 2008 and 2009 on the theoretical aspects of topological data analysis was turned into several papers that appeared in top conferences this year:
As planned a year ago, 2009 was mainly devoted to the study of the potential applications of the above theoretical advances:
Avant de remplir les tableaux, consultez les règles au paragraphe "Financement" de la page d'accueil du programme.
1. Dépenses EA (effectuées sur les crédits de l'Equipe Associée) | Montant dépensé |
Invitations des partenaires | 3725 |
Missions INRIA | 12000 |
Logistique du workshop TGDA | 3000 |
Total | 18725 |
Remarks:
2. Dépenses externes (effectuées sur des financements hors EA) | Montant dépensé | |
Région PACA: | ||
Invitations des partenaires | 0 | |
Missions INRIA vers le partenaire | 1800 | |
Total | 1800 |
SIAM: | ||
Invitations des partenaires | 0 | |
Missions INRIA vers le partenaire | 500 | |
Total | 500 |
France-Stanford Fund: | ||
Invitations des partenaires | 6500 | |
Missions INRIA vers le partenaire | 0 | |
Total | 6500 |
NSF: | ||
Invitations des partenaires | 0 | |
Missions INRIA vers le partenaire | 8000 | |
Total | 8000 |
Total des financements externes dépensés |
16800 |
Total des financements EA et externes dépensés |
35525 |
1. Chercheurs Seniors
Nom |
statut (1) |
provenance | destination |
objet (2) |
durée (3) |
Coût (si financement EA) |
Coût (si financement externe) |
Steve Oudot | CR | Geometrica | Stanford | visit | 1 month | 0 (travel expenses covered by the 2008 budget of TGDA) | 3000 (local expenses) |
Leonidas Guibas | Full Professor | Stanford | Paris | TGDA workshop | 4 days | 0 | 2500 (travel + local expenses) |
Frédéric Chazal | DR | Geometrica | Stanford | visit | 2 weeks | 1000 (travel expenses) | 0 (local expenses covered by the 2010 budget) |
Jean-Daniel Boissonnat | DR | Geometrica | SODA'10 + Stanford | conference + visit | 2 weeks 4 days | 2500 (travel expenses + local expenses at the conference + registration fee) | 0 (local expenses for the visit covered by the 2010 budget) |
Steve Oudot | CR | Geometrica | SODA'10 | conference | 4 days | 2500 (travel + local expenses + registration fee) | 0 |
Frédéric Chazal | DR | Geometrica | SODA'10 | conference | 4 days | 2500 (travel + local expenses + registration fee) | 0 |
Leonidas Guibas | Full Professor | Stanford | SODA'10 | conference | 4 days | 0 | 2000 (travel + local expenses + registration fee) |
Total des durées |
2 months 20 days |
2. Juniors
Nom |
statut (1) |
provenance |
destination | objet (2) |
durée (3) |
Coût (si financement EA) |
Coût (si financement externe) |
Quentin Mérigot | PhD student | Geometrica | Stanford | visit | 1.5 months | 1750 (travel + part of the local expenses) | 900 (bourse région PACA) |
Dmitriy Morozov | post-doc | Stanford | Paris | TGDA workshop + visit | 20 days | 2000 (travel + local expenses) | 0 |
Facundo Mémoli | post-doc | Stanford | Paris | TGDA workshop | 4 days | 300 (local expenses) | 1000 (travel expenses) |
Mikael Vejdemo-Johansson | post-doc | Stanford | Paris | TGDA workshop | 10 days | 750 (local expenses) | 1000 (travel expenses) |
Maksim Ovsjanikov | PhD student | Stanford | Paris | TGDA workshop | 4 days | 375 (local expenses) | 1000 (travel expenses) |
Peter Huang | PhD student | Stanford | Paris | TGDA workshop | 4 days | 300 (local expenses) | 1000 (travel expenses) |
Pooran Memari | PhD student | Geometrica | Stanford | visit | 1.5 months | 750 (travel expenses) | 900 bourse region PACA + 3000 NSF grant (local expenses) |
Quentin Mérigot | PhD student | Geometrica | SPM'09 | conference | 4 days | 1000 (travel expenses) | 500 SIAM student grant (registration fee + local expenses) |
Total des durées |
4 months 15 days |
Since 2009, the Geometrica group has been part of a joint effort with the Geometric Computation and Applied Topology groups at Stanford University to develop the successor of the PLEX library, now considered a standard in computational topology. The new library, named CULT ( Computational Unified Library for Topology), will offer a vast set of tools that reflect the state of the art in the field. It should be able to handle very large and high-dimensional data sets within reasonable amount of time and space, while keeping enough flexibility for people to be able to build their own algorithms upon it. Potential interactions with other libraries like CGAL will also be investigated. We are setting the development of the first release of the library among our primary goals for the upcoming years, and especially for 2010.
Making the library effective will require a lot of work on designing new data structures whose complexities scale up reasonably with the size and dimensionality of the input data. To give an idea of the challenge, each time the dimension of the Rips complex is increased by one, its size is multiplied roughly by ten. As a result, people commonly have to deal with complexes containing dozens of millions of simplices. Such complexes do not fit into main memory, at least with a naive data representation. On the other hand, brute-force compression of the data results in prohibitively slow access times. A trade-off between space and time has to be found, which will be one of the objectives of Marc Glisse, a new chargé de recherches joining Geometrica and TGDA this Fall.
Another way of dealing with massive data sets is to design out-of-core algorithms. Our clustering scheme is one example: even though the input is in the form of a distance matrix, the algorithm only has to scan the matrix once and column-wise, loading only one column at a time in main memory. As a result, the algorithm can deal with data sets containing millions of points within a few minutes, with a memory consumption that is only linear in the size of the point cloud. This property motivated our choice of making this code the very first one to be inserted in the CULT library, so as to set a standard for future code developments. Can other algorithms in the computational topology literature be made out-of-core?
Another question related to data structures is how fast they can be updated under point insertions and deletions. Constructions in topological data analysis mostly rely on comparisons between distances, simply because more elaborate geometric predicates are difficult (if at all possible) to implement in high dimensions. Thus, nearest neighbor queries and reverse nearest neighbor queries are by far the most common during updates. Alhough the former type of queries has been well studied in the past and can be performed reasonably fast, there is hardly any literature on the latter type of queries. We are currently collaborating with two Ph.D. students in the Theory group at Stanford University to devise a data structure based on Locality-Sensitive Hashing (LSH) that answers (approximate) reverse nearest neighbor queries in sublinear time, with a complexity that scales up polynomially with the ambient dimension. This data structure uses LSH as a black-box and reduces the approximate reverse nearest neighbor problem to the Point Location Among Equal Balls problem, in the same spirit as in the original LSH paper but with a completely different construction. A paper on the subject is currently being written and should be submitted in 2010. Then, the dynamic version of the problem, where new data points may be inserted or old ones deleted, will be considered.
Many potential applications of our work might be considered, and our hope is that the CULT library eventually becomes a standard tool for developing such applications. In a near future we intend to focus primarily on the following ones:
In the context of topological inference from point cloud data, people have been primarily interested in the early parts of the filtrations they built, as the rest of these ever-growing families of complexes quickly became too big to be stored. The bottomline was that inferring the homology of the space underlying the input point cloud only required to know the beginning of the barcode associated with the filtration. In fact, the very notion of the {\em space underlying the data} is subject to interpretation and depends on the scale at which the data is inspected. Take for instance a point cloud sampled from a helicoidal curve that is drawn on the Clifford torus in R^4. This data set admits at least three candidate underlying spaces: at a small scale, the curve; at a larger scale, the torus; and at an even larger scale, the 3-sphere of radius sqrt(2) on which the Clifford torus is naturally embedded. One might also add the point cloud itself and R^4 at either ends of the spectrum. It would be very interesting to see if these various possible answers appear in the barcodes of our filtrations, with transition effects between scales. For this we need to be able to compute the full filtrations, or at least to approximate their full barcodes. Two projects in this direction have been started:
1. Echanges
Our goal for 2010 will be to keep up the rate of exchanges to the same level as in 2008-2009, especially at junior level :
In addition to the above exchanges, let us mention that the post-doctoral internship of Primoz Skraba in the Geometrica group will be continued in 2010, funded by the ANR grant GIGA (started this Fall). Conversely, Quentin Mérigot will start a post-doctoral internship in the Geometric Computation group at Stanford University in January.
1. ESTIMATION DES DÉPENSES EN MISSIONS INRIA VERS LE PARTENAIRE | Nombre de personnes
|
Coût estimé
|
Chercheurs confirmés | 3 | 6000 (travel expenses covered by the 2009 budget) |
Post-doctorants |
||
Doctorants | 1 | 4000 (travel expenses covered by the 2009 budget) |
Stagiaires |
||
Autre (précisez) : |
||
Total
|
4 | 10000 |
2. ESTIMATION DES DÉPENSES EN INVITATIONS DES PARTENAIRES | Nombre de personnes
|
Coût estimé
|
Chercheurs confirmés | 1 | 5000 (travel + local expenses) |
Post-doctorants |
||
Doctorants | 2 | 12000 (travel + intern salary) |
Stagiaires |
||
Autre (précisez) : |
||
Total
|
3 | 17000 |
2. Cofinancement
Cette coopération bénéficie-t-elle déjà d'un soutien financier de la part de l'INRIA, de l'organisme étranger partenaire ou d'un organisme tiers (projet européen, NSF, ...) ?
3. Demande budgétaire
Indiquez, dans le tableau ci-dessous, le coût global estimé de
la proposition et le budget demandé à la DRI dans le cadre de
cette Equipe Associée.
(maximum
20 K€ pour une prolongation en 2e année et 10 K€ pour une
3e année).
Commentaires
|
Montant
|
A. Coût global de la proposition (total des tableaux 1 et 2 : invitations, missions, ...) | 27000 |
B. Cofinancements utilisés (financements autres que Equipe Associée) | 17000 |
Financement "Équipe Associée" demandé (A.-B.) |
10000 |
Remarques ou observations :