A study on structure recovery and the broadcasting problem

Vasiliki Velona

Ayuda

A study on structure recovery and the broadcasting problem

Autores: Vasiliki Velona
Directores de la Tesis: Gábor Lugosi (dir. tes.) , Juanjo Rué Perna (codir. tes.)
Lectura: En la Universitat Politècnica de Catalunya (UPC) ( España ) en 2021
Idioma: español
Texto completo no disponible (Saber más ...)
Resumen
- The thesis contains a study of two problems of combinatorial statistics. The first one is structure learning for partial correlation graphs and the second one is the broadcasting problem on certain families of random recursive trees. In a precise language, the structure recovery problem that we study is the following: given access to individual entries of a covariance matrix S, learn the support of the inverse of S (let us denote this matrix by K) using only a small fraction of S. We call this problem ‘structure recovery’ since the zero-entries of K define the adjacency matrix of a graph (called partial correlation graph). As an example of why this is an important graph to learn, consider that S corresponds to a Gaussian random vector (X1, ..., Xn). Then an entry sij is zero if and only if Xi and Xj are independent given the rest of the random variables. A series of algorithms is proposed to address the aforementioned question, assuming that our graphs satisfy certain sparsity conditions. The sparsity that is assumed is related to how much our graph resembles a tree; in particular we deal with trees, graphs of small 2 connected components, and graphs of small treewidth. The proposed algorithms can also be used to estimate K and not only learn the partial correlation graph. Moreover, they can be used to invert any symmetric positive definite matrix since the analysis can be detached from its statistical connection and impact.The motivation for the use of covariance entries is that S might be too large to even store, as it often happens in statistical settings. In fact, our goal is to learn the partial correlation graph using sub-quadratic number of queries, since quadratic time is needed just to write down and store the covariance matrix – this is the starting point for a big part of the literature. The desired complexity bounds are achieved through our analysis.
  
  Moving to the second problem under study, we consider a broadcasting process on a graph to be the propagation of a message (let us say a bit value in {0, 1}) from one node to all the rest, possibly corrupted. Our goal is to guess the initial message. We consider that our graph is a tree and is created dynamically in times 0, 1..., n, in a way that at time i the i-th vertex enters the system and attaches with an edge to an existing vertex j (we then write i ~ j). We are interested in the case where i attaches uniformly at random to an existing vertex (uniform attachment) or where i attaches to a vertex with probability proportional to the outdegree of an existing vertex, plus some parameter ß > 0. The broadcasting process we consider is one where vertex 0 (the root) has a bit value that is propagated correctly to its neighbours with probability 1 - q and incorrectly with probability q. The broadcasting problem under study can be formulated in this way: given access to a random tree produced by either uniform attachment or preferential attachment and the bit values of the vertices, but without observing the time labels of the vertices, recover the bit of vertex zero. In a more difficult variant, we answer the same question given only the bits of vertices with outdegree zero (the leaves). In both variants of the problem in both models, we characterize the values of q for which the optimal reconstruction method has a probability of error bounded away from 1/2. We also show that the probability of error is bounded by a constant times q. Two simple reconstruction rules are analyzed in detail. One of them is the simple majority vote, the other is the bit value of the centroid of the tree (or the closest leaf to the centroid). We also analyze a third reconstruction rule which is more complex but works for all q where reconstruction is theoretically possible.