Skip to main content
Abstract Large bilingual parallel texts (also known as bitexts) are usually stored in a compressed form, and previous work has shown that they can be more efficiently compressed if the fact that the two texts are mutual translations is... more
Abstract Large bilingual parallel texts (also known as bitexts) are usually stored in a compressed form, and previous work has shown that they can be more efficiently compressed if the fact that the two texts are mutual translations is exploited. For example, a bitext can be seen as a sequence of biwords—pairs of parallel words with a high probability of cooccurrence—that can be used as an intermediate representation in the compression process.
2: Grupo Kybele, Departamento de Lenguajes y Sistemas Informáticos ES Cc. Experimentales y Tecnología Universidad Rey Juan Carlos 28933 Móstoles, Madrid, España Email: carlos. cuesta@ urjc. es, web: http://www. kybele. es Palabras clave:... more
2: Grupo Kybele, Departamento de Lenguajes y Sistemas Informáticos ES Cc. Experimentales y Tecnología Universidad Rey Juan Carlos 28933 Móstoles, Madrid, España Email: carlos. cuesta@ urjc. es, web: http://www. kybele. es Palabras clave: Adaptabilidad, Arquitectura de Componentes, Groupware, SOA, Pipegraph.
Abstract The current Web of Data is producing increasingly large RDF datasets. Massive publication efforts of RDF data driven by initiatives like the Linked Open Data movement, and the need to exchange large datasets has unveiled the... more
Abstract The current Web of Data is producing increasingly large RDF datasets. Massive publication efforts of RDF data driven by initiatives like the Linked Open Data movement, and the need to exchange large datasets has unveiled the drawbacks of traditional RDF representations, inspired and designed by a document-centric and human-readable Web. Among the main problems are high levels of verbosity/redundancy and weak machine-processable capabilities in the description of these datasets.
Abstract The use of dictionaries is a common practice among those applications performing on huge RDF datasets. It allows long terms occurring in the RDF triples to be replaced by short IDs which reference them. This decision greatly... more
Abstract The use of dictionaries is a common practice among those applications performing on huge RDF datasets. It allows long terms occurring in the RDF triples to be replaced by short IDs which reference them. This decision greatly compacts the dataset and thus mitigates its scalability issues. However, the dictionary size is not negligible and the techniques used for its representation also suffer from scalability limitations.
Abstract The use of dictionaries is a common practice among those applications performing on huge RDF datasets. It allows long terms occurring in the RDF triples to be replaced by short IDs which reference them. This decision greatly... more
Abstract The use of dictionaries is a common practice among those applications performing on huge RDF datasets. It allows long terms occurring in the RDF triples to be replaced by short IDs which reference them. This decision greatly compacts the dataset and mitigates the scalability issues underlying to its management. However, the dictionary size is not negligible and the techniques used for its representation also suffer from scalability limitations.
Huge RDF datasets are currently exchanged on textual RDF formats, hence consumers need to post-process them using RDF stores for local consumption, such as indexing and SPARQL query. This results in a painful task requiring a great effort... more
Huge RDF datasets are currently exchanged on textual RDF formats, hence consumers need to post-process them using RDF stores for local consumption, such as indexing and SPARQL query. This results in a painful task requiring a great effort in terms of time and computational resources. A first approach to lightweight data exchange is a compact (binary) RDF serialization format called HDT.
The Web of Data is producing large RDF datasets from diverse fields. The increasing size of the data being published threatens to make these datasets hardly to exchange, index and consume. This scalability problem greatly diminishes the... more
The Web of Data is producing large RDF datasets from diverse fields. The increasing size of the data being published threatens to make these datasets hardly to exchange, index and consume. This scalability problem greatly diminishes the potential of interconnected RDF graphs. The HDT format addresses these problems through a compact RDF representation, that partitions and efficiently represents three components: Header (metadata), Dictionary (strings occurring in the dataset), and Triples (graph structure).
Resumen: La demanda de información se ha multiplicado en los últimos años gracias, principalmente, a la globalización en el acceso a la WWW, Esto ha propiciado un aumento sustancial en el tamaño de las colecciones de texto disponibles en... more
Resumen: La demanda de información se ha multiplicado en los últimos años gracias, principalmente, a la globalización en el acceso a la WWW, Esto ha propiciado un aumento sustancial en el tamaño de las colecciones de texto disponibles en formato electrónico, cuya compresión no sólo permite obtener un ahorro espacial sino que, a su vez, aumenta la eficiencia de sus procesos de entrada/salida y de transmisión en red.
Abstract The word-codeword mapping technique allows words to be managed in PPM modelling when a natural language text file is being compressed. The main idea for managing words is to assign them codes in order to improve the compression.... more
Abstract The word-codeword mapping technique allows words to be managed in PPM modelling when a natural language text file is being compressed. The main idea for managing words is to assign them codes in order to improve the compression. The previous work was focused on proposing several mapping adaptive algorithms and evaluating them. In this paper, we propose a semi-static word-codeword mapping method that takes advantage of by previous knowledge of some statistical data of the vocabulary. We test ...
Although e-books usage has a positive impact in educational environments, contents representation is a complex issue given their audience. In this paper, we show a flexible and functional appearance that allows a synchronized consultation... more
Although e-books usage has a positive impact in educational environments, contents representation is a complex issue given their audience. In this paper, we show a flexible and functional appearance that allows a synchronized consultation of the literary editions integrated in an electronic work.
This paper presents, from e-book features, the concept of electronic work as a medium for publishing classic literature in different editions demanded by the Spanish educational system. The electronic work is an entity which, focused in... more
This paper presents, from e-book features, the concept of electronic work as a medium for publishing classic literature in different editions demanded by the Spanish educational system. The electronic work is an entity which, focused in its logical structure, provides a set of interaction services designed by means of Aqueducts, a processing model driven by XML data.