Ir al contenido

Documat


Resumen de Algorithms and compressed data structures for information retrieval

Susana Ladra Árbol académico

  • In this thesis we address the problem of the efficiency in Information Retrieval by presenting new compressed data structures and algorithms that can be used in different application domains and achieve interesting space/time properties.

    We propose (i) a new variable-length encoding scheme for sequences of integers that enables fast direct access to the encoded sequence and outperforms other solutions used in practice, such as sampling methods that introduce an undesirable space and time penalty to the encoding; (ii) a new self-indexed representation of the compressed text obtained by any word-based, byte-oriented compression technique that allows for fast searches of words and phrases over the compressed text occupying the same space than the space achieved by the compressors of such type, and obtains better performance than classical inverted indexes when little space is used; and (iii) a new compact representation of Web graphs that supports efficient forward and reverse navigation over the graph using the smallest space reported in the literature, and in addition it also allows for extended functionality not usually considered in compressed graph representations.

    These data structures and algorithms can be used in several scenarios, and we experimentally show that they can successfully compete with other techniques commonly used in those domains.


Fundación Dialnet

Mi Documat