Recent talks

January 23, 13:30.

Speaker: Nicolas Berkouk (INRIA).

Title : A derived isometry theorem for constructible sheaves on R

Persistent homology has been recently studied with the tools of sheaf theory in the derived setting by Kashiwara and Schapira after J.Curry has made the first link between persistent homology and sheaves. We prove the isometry theorem in this derived setting, thus expressing the convolution distance of sheaves as a matching distance between combinatorial objects associated to them that we call graded barcodes. This allows us to consider sheaf-theoretical constructions as combinatorial, stable topological descriptors of data, and generalizes the situation of persistence with one parameter. In the second part of the talk, we relate sheaf-theoretic and persistence-theoretic constructions, and show how the derived isometry theorem allows to give a new, deeper, interpretation of level-set persistence stability. 


December 3, 10:30.

Speaker: Fang-Pang Lin (National Center for High Performance Computing, NARL).

Title : The Path from Sensing to Understanding – Development of Cyberinfrastructure in NCHC.

In early 2000’s, the rising of optical fiber network introduces a wide geographical connectivity between not only computer nodes but also between scientific instruments. The development of this kind of distributed, orchestrated infrastructure is often referred to as cyberinfrastructure (CI). NCHC adopted the similar concept and applied to various sensor networks including distributed systems for fast medical response to epidemic outbreak, flood alert to island-wide river monitoring, and long term study to comparative ecological research. The development leads to cross-discipline collaboration not only in national scale but also to global scale. After more than a decade’s development, it is found that our focus on large scale sensor networks and it’s backend computer system is not sufficient to answer the real needs in the applications, which requires to use more data and models to explain and to predict, hence better understanding. Our experiences lead to now CI for big data and shift to more algorithmic study in data science. In this talk, I will give an introduction of this development and the current CI applications that we developed. I will also extend to the possible collaboration in TDA with our applications.


November 29, at 13:00.

Speaker: Jisu Kim (Inria, Saclay).

Title : Statistical inference for geometric data.

Geometric structures can aid statistics in several ways. In high dimensional statistics, geometric structures can be used to reduce dimensionality. High dimensional data entails the curse of dimensionality, which can be avoided by if there are low dimensional geometric structures. On the other hand, geometric structures also provide useful information. Structures may carry scientific meaning about the data and can be used as features to enhance supervised or unsupervised learning.

In this talk, I will explore how statistical inference can be done on geometric structures.  First, I will explore the minimax rates of dimension estimator and reach estimator.  Second, I will investigate inference on cluster trees and persistent homology of density filtration on rips complex. Third, I will present R package TDA for computing topological data analysis.

November 21, at 13:00.

Speaker: Gilles Blanchard (Universität Potsdam).

Title : Construction of tight wavelet-like frames on graphs for denoising.

We construct a frame (redundant dictionary) for the space of real-valued functions defined on a neighborhood graph constructed from data points. This frame is adapted to the underlying geometrical structure (e.g. the points belong to an unknown low dimensional manifold), has finitely many elements, and these elements are localized in frequency as well as in space. This construction follows the ideas of Hammond et al. (2011), with the key point that we construct a tight (or Parseval) frame. This means we have a very simple, explicit reconstruction formula for every function defined on the graph from the coefficients given by its scalar product with the frame elements. We use this representation in the setting of denoising where we are given noisy observations of a function defined on the graph. By applying a thresholding method to the coefficients in the reconstruction formula, we define an estimate of the underlying signal whose risk satisfies a tight oracle inequality.


November 22, at 13:00.

Speaker: Frédéric Magniette (Ecole Polytechnique).

Title : Statistical Algorithms for Particle Trajectography.

The various algorithms used to extrapolate particle trajectories from measurements are often very time-consuming with computational complexities which are typically quadratic. We propose a new algorithm called GEM with a linear complexity and reasonable performance on linear tracks. It is an extension of the EM algorithm used to fit Gaussian mixtures. It works in arbitrary dimension and with an arbitrary number of simultaneous particles. In a second part, we extend it to circular tracks (for charged particles) and even a mix of linear and circular tracks. This algorithm is implemented in an open-source library called "libgem" and some applications are proposed, based on data-sets from different kind of particle trackers.



November 15, at 13:00.

Speaker: Raphael Tinarrage (Inria, Saclay).

Title : DTM-filtrations.

Despite strong stability properties, the persistent homology of filtrations classically used in Topological Data Analysis, such as, e.g. the Cech or Vietoris-Rips filtrations, are very sensitive to the presence of outliers in the data from which they are computed. In this talk, we will introduce a new family of filtrations, the DTM-filtrations, built on top of point clouds in the Euclidean space which are more robust to noise and outliers. The approach adopted in this work relies on the notion of distance-to-measure functions.



October 18

Speaker: Theo Lacombe (Inria, Saclay).

Title : Optimal Partial Transport and its application to persistence measures and diagrams.

Optimal transport theory provides tools to compares probability measures. Modern approximation techniques, such as the entropic smoothing, make it useful at large scale. It can be adapted in various ways to deal with non-negative measures with arbitrary mass. We will present an approach introduced by Figalli and Gigli (2010) that appears to be especially suited to deal with persistence diagrams. We will show how this Optimal Transport view helps to formulate many problems by considering persistence diagrams as measures, and how it can be mixed with entropic smoothing to numerically scale in applications.



October 11

Speaker: W. Polonik (University of California, Davis).

Title : Extracting multiscale information from high-dimensional and non-Euclidean data.

We discuss a novel idea for investigating geometric aspects of high-dimensional Euclidean data and, more generally, Hilbert space-valued data. Similar to the kernel trick, the method consists in constructing feature functions, which are then being used in further analysis. In contrast to thekernel trick, however, the construction of our feature functions is based on geometric considerations, which enhances the interpretability. Moreover, our feature functions are real-valued function defined on the interval [0,1], and thus they can be plotted, leading to various graphical tools.Theoretic investigations reveal that the corresponding estimation methods combat the curse of dimensionality, and there is also some adaptation to sparsity. Connections to other statistical methods, such as local depth, random set theory and nonlinear multidimensional scaling will also be indicated.


October 4

Speaker: Hariprasad Kannan (Inria, Saclay).

Title : Unsupervised learning, a perspective based on optimal transport and sparse regularization.

Unsupervised learning is an important topic in machine learning. We will discuss approaches for unsupervised learning based on optimal transport and sparse regularization. Optimal transport presents a challenge from an optimization point of view with its simplex constraints on the rows and columns of the transport plan. We show one way to formulate efficient optimization problems inspired by optimal transport. This could be done by imposing only one set of the simplex constraints and by imposing structure on the transport plan through sparse regularization. We show how unsupervised learning algorithms like exemplar clustering, center based clustering and kernel PCA could fit into this framework based on different forms of regularization. We especially demonstrate a promising approach to address the pre-image problem in kernel PCA. Several methods have been proposed over the years, which generally assume certain types of kernels or have too many hyper-parameters or make restrictive approximations of the underlying geometry. We present a more general method, with only one hyper-parameter to tune and with some interesting geometric properties. From an optimization point of view, we show how to compute the gradient of a smooth version of the Schatten p-norm and how it can be used within a majorization-minimization scheme.