Ir al contenido

Documat


Algorithms and compressed data structures for information retrieval

  • Autores: Susana Ladra González Árbol académico
  • Directores de la Tesis: Gonzalo Navarro Badino (dir. tes.) Árbol académico, Nieves R. Brisaboa (dir. tes.) Árbol académico
  • Lectura: En la Universidade da Coruña ( España ) en 2011
  • Idioma: inglés
  • Tribunal Calificador de la Tesis: Isidro Ramos Salavert (presid.) Árbol académico, Ricardo Baeza Yates (secret.) Árbol académico, Alejandro López Ortiz (voc.) Árbol académico, Josep Díaz Cort (voc.) Árbol académico, Paolo Ferragina (voc.) Árbol académico
  • Enlaces
    • Tesis en acceso abierto en: RUC
  • Resumen
    • In this thesis we address the problem of the efficiency in Information Retrieval by presenting new compressed data structures and algorithms that can be used in different application domains and achieve interesting space/time properties.

      We propose (i) a new variable-length encoding scheme for sequences of integers that enables fast direct access to the encoded sequence and outperforms other solutions used in practice, such as sampling methods that introduce an undesirable space and time penalty to the encoding; (ii) a new self-indexed representation of the compressed text obtained by any word-based, byte-oriented compression technique that allows for fast searches of words and phrases over the compressed text occupying the same space than the space achieved by the compressors of such type, and obtains better performance than classical inverted indexes when little space is used; and (iii) a new compact representation of Web graphs that supports efficient forward and reverse navigation over the graph using the smallest space reported in the literature, and in addition it also allows for extended functionality not usually considered in compressed graph representations.

      These data structures and algorithms can be used in several scenarios, and we experimentally show that they can successfully compete with other techniques commonly used in those domains.


Fundación Dialnet

Mi Documat

Opciones de tesis

Opciones de compartir

Opciones de entorno