Ir al contenido

Documat


Resumen de Matrix completion with prior information in reproducing kernel hilbert spaces

Pedro Juan Giménez Febrer

  • In matrix completion, the objective is to recover an unknown matrix from a small subset of observed entries. Most successful methods for recovering the unknown entries are based on the assumption that the unknown full matrix has low rank. By having low rank, each of its entries are obtained as a function of a small number of coefficients which can be accurately estimated provided that there are enough available observations. Hence, in low-rank matrix completion the estimate is given by the matrix of minimum rank that fits the observed entries.

    Besides low rankness, the unknown matrix might exhibit other structural properties which can be leveraged in the recovery process. In a smooth matrix, it can be expected that entries that are close in index distance will have similar values. Similarly, groups of rows or columns can be known to contain similarly valued entries according to certain relational structures. This relational information is conveyed through different means such as covariance matrices or graphs, with the inconvenient that these cannot be derived from the data matrix itself since it is incomplete. Hence, any knowledge on how the matrix entries are related among them must be derived from prior information.

    This thesis deals with matrix completion with prior information, and presents an outlook that generalizes to many situations. In the first part, the columns of the unknown matrix are cast as graph signals with a graph known beforehand. In this, the adjacency matrix of the graph is used to calculate an initial point for a proximal gradient algorithm in order to reduce the iterations needed to converge to a solution. Then, under the assumption that the graph signals are smooth, the graph Laplacian is incorporated into the problem formulation with the aim to enforce smoothness on the solution. This results in an effective denoising of the observed matrix and reduced error, which is shown through theoretical analysis of the proximal gradient coupled with Laplacian regularization, and numerical tests.

    The second part of the thesis introduces a framework to exploit prior information through reproducing kernel Hilbert spaces. Since a kernel measures similarity between two points in an input set, it enables the encoding of any prior information such as feature vectors, dictionaries or connectivity on a graph. By associating each column and row of the unknown matrix with an item in a set, and defining a pair of kernels measuring similarity between columns or rows, the missing entries can be extrapolated by means of the kernel functions. A method based on kernel regression is presented, with two additional variants aimed at reducing computational cost, and online implementation. These methods prove to be competitive with existing techniques, especially when the number of observations is very small.

    Furthermore, mean-square error and generalization error analyses are carried out, shedding light on the factors impacting algorithm performance. For the generalization error analysis, the focus is on the transductive case, which measures the ability of an algorithm to transfer knowledge from a set of labelled inputs to an unlabelled set. Here, bounds are derived for the proposed and existing algorithms by means of the transductive Rademacher complexity, and numerical tests confirming the theoretical findings are presented.

    Finally, the thesis explores the question of how to choose the observed entries of a matrix in order to minimize the recovery error of the full matrix. A passive sampling approach is presented, which entails that no labelled inputs are needed to design the sampling distribution; only the input set and kernel functions are required. The approach is based on building the best Nyström approximation to the kernel matrix by sampling the columns according to their leverage scores, a metric that arises naturally in the theoretical analysis to find an optimal sampling distribution.


Fundación Dialnet

Mi Documat