To download this notebook or its pdf version:
http://geometrica.saclay.inria.fr/team/Fred.Chazal/MVA2018.html
Documentation for the latest version of Gudhi:
Download the data at the following address: http://geometrica.saclay.inria.fr/team/Fred.Chazal/slides/data_acc.dat, save it as a file named data_acc.dat, and load it using the pickle module:
import numpy as np
import pickle as pickle
import gudhi as gd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import manifold
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.cluster import KMeans
f = open("data_acc.dat","rb")
data = pickle.load(f,encoding="latin1")
f.close()
data_A = data[0]
data_B = data[1]
data_C = data[2]
label = data[3]
%matplotlib inline
The walk of 3 persons A, B and C has been recorded using the accelerometer sensor of a smartphone in their pocket, giving rise to 3 multivariate time series in $\mathbb{R}^3$: each time series represents the 3 coordinates of the acceleration of the corresponding person in a coordinate system attached to the sensor (take care that, as the smartphone was carried in a possibly different position for each person, these time series cannot be compared coordinates by coordinates). Using a sliding window, each series has been split in a list of 100 time series made of 200 consecutive points, that are now stored in data_A, data_B and data_C.
data_A_sample = data_A[0]
plt.gca(projection='3d')
plt.plot(data_A_sample [:,0],data_A_sample [:,1],data_A_sample [:,2])
# B is the pairwise distance matrix between 0 or 1-dim dgms
#label_color contains the colors corresponding to the class of each dgm
mds = manifold.MDS(n_components=3, max_iter=3000, eps=1e-9, dissimilarity="precomputed", n_jobs=1)
pos1 = mds.fit(B1).embedding_
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(pos1[:,0], pos1[:, 1], pos1[:,2], marker = 'o', color=label_color)
def sliding_window_data(x,edim,delay=1):
"""time delay embedding of a d-dim times series into R^{d*edim}
the time series is assumed to be periodic
parameters:
+ x: a list of d lists of same length L or a dxL numpy array
+ edim: the number of points taken to build the embedding in R^{d*edim}
+ delay: embeeding given by (x[i],x[i+delay],...,x[i + (edim-1)*delay])
Default value for delay is 1
"""
ts = np.asarray(x)
if len(np.shape(ts)) == 1:
ts = np.reshape(ts,(1,ts.shape[0]))
ts_d = ts.shape[0]
ts_length = ts.shape[1]
#output = zeros((edim*ts_d,nb_pt))
output = ts
for i in range(edim-1):
output = np.concatenate((output,np.roll(ts,-(i+1)*delay,axis=1)),axis=0)
return output
Landscape construction is currently only available in the C++ version of Gudhi. Here is a simple python implementation you can use for this class.
def landscapes_approx(diag,p_dim,x_min,x_max,nb_nodes,nb_ld):
"""Compute a dicretization of the first nb_ld landscape of a
p_dim-dimensional persistence diagram on a regular grid on the
interval [x_min,x_max]. The output is a nb_ld x nb_nodes numpy
array
+ diag: a persistence diagram (in the Gudhi format)
+ p_dim: the dimension in homology to consider
"""
landscape = np.zeros((nb_ld,nb_nodes))
diag_dim = []
for pair in diag: #get persistence points for homology in dimension dim
if (pair[0] == p_dim):
diag_dim.append(pair[1])
step = (x_max - x_min) / (nb_nodes - 1)
#Warning: naive and not the most efficient way to proceed!!!!!
for i in range(nb_nodes):
x = x_min + i * step
t = x / np.sqrt(2)
event_list = []
for pair in diag_dim:
b = pair[0]
d = pair[1]
if b <= t <= d:
if t >= (d+b)/2:
event_list.append((d-t)*np.sqrt(2))
else:
event_list.append((t-b)*np.sqrt(2))
event_list.sort(reverse=True)
event_list = np.asarray(event_list)
for j in range(nb_ld):
if(j<len(event_list)):
landscape[j,i]=event_list[j]
return landscape
# Example of parameters, you don't have to use those
nb_ld = 5 # number of Landscapes
nb_nodes = 500
length_max = 1.0
The goal of this exercise is to implement the bootstrap algorithm below from [ F. Chazal, B.T. Fasy, F. Lecci, A. Rinaldo, L. Wasserman. Stochastic Convergence of Persistence Landscapes and Silhouettes. in Journal of Computational Geometry, 6(2), 140-161, 2015] to compute confidence bands for landscapes. As an example compute confidence bands for the expected landscapes for each of the 3 classes in the accelerometer data set.
Input: landscapes $\lambda_1,\ldots,\lambda_n$; confidence level $1-\alpha$; number of bootstrap samples $B$
Output: confidence functions $\ell_n, u_n \colon \mathbb{R} \to \mathbb{R}$