Ir al contenido

Documat


Resumen de A structural and quantitative analysis of the webof linked data and its components to perform retrieval data

Alberto Nogales Moyano Árbol académico

  • español

    Esta investigación consiste en un análisis cuantitativo y estructural de la Web of Linked Data con el fin de mejorar la búsqueda de datos en distintas fuentes. Para obtener métricas cuantitativas de la Web of Linked Data, se aplicarán técnicas estadísticas. En el caso del análisis estructural haremos un Análisis de Redes Sociales (ARS).

    Para tener una idea de la Web of Linked Data para poder hacer un análisis, nos ayudaremos del diagrama de la Linking Open Data (LOD) cloud. Este es un catálogo online de datasets cuya información ha sido publicada usando técnicas de Linked Data. Los datasets son publicados en un lenguaje llamado Resource Description Framework (RDF), el cual crea enlaces entre ellos para que la información pudiera ser reutilizada.

    El objetivo de obtener un análisis cuantitativo y estructural de la Web of Linked Data es mejorar las búsquedas de datos. Para ese propósito nosotros nos aprovecharemos del uso del lenguaje de marcado Schema.org y del proyecto Linked Open Vocabularies (LOV).

    Schema.org es un conjunto de etiquetas cuyo objetivo es que los Webmasters pudieran marcar sus propias páginas Web con microdata. El microdata es usado para ayudar a los motores de búsqueda y otras herramientas Web a entender mejor la información que estas contienen. LOV es un catálogo para registrar los vocabularios que usan los datasets de la Web of Linked Data. Su objetivo es proporcionar un acceso sencillo a dichos vocabularios.

    En la investigación, vamos a desarrollar un estudio para la obtención de datos de la Web of Linked Data usando las fuentes mencionadas anteriormente con técnicas de ¿ontology matching¿. En nuestro caso, primeros vamos a mapear Schema.org con LOV, y después LOV con la Web of Linked Data. Un ARS de LOV también ha sido realizado. El objetivo de dicho análisis es obtener una idea cuantitativa y cualitativa de LOV. Sabiendo esto podemos concluir cosas como: cuales son los vocabularios más usados o si están especializados en algún campo o no. Estos pueden ser usados para filtrar datasets o reutilizar información.

  • English

    This research consists of a quantitative and structural analysis of the Web of Linked Data to improve the prospects for data retrieval. The Web of Linked Data arose when companies and organizations started to publish data sources that could be openly accessed by Web users. These datasets had different mechanism of access and formats, so Tim Berners Lee proposed the four principles for publishing and interlinking structured data on the Web.

    In order to obtain quantitative metrics of the Web of Linked Data, statistical techniques are applied. In the case of the structural analysis Social Network Analysis (SNA) is used. SNA is the process to study of the relations of link structures applying graph and network theory. Nodes and edges form these kinds of structures. The nodes represent the actors and the edges represent the relations between them.

    To have a snapshot of the Web of Linked Data in order to make the analysis, we started from the Linked Open Data (LOD) cloud diagram. This is an online catalogue of datasets whose information have been published using Linked Data techniques. These sets of data have been created by companies, organizations and individuals of the Open Data Movement interested in opening their own information so regular users could work with them. The datasets are published in a language called Resource Description Framework (RDF), which creates links between them, so information could be reused.

    The aim of obtaining a quantitative and structural analysis of the Web of Linked Data is to improve data retrieval. Having an in-depth idea of the structure and the characteristics of LOD, it is possible to enhance the use of its data. In future works, users’ searches could be faster and more accurate. For that purpose, we will take advantage of the use of the vocabulary Schema.org and the project LOV (Linked Open Vocabularies).

    Schema.org is a set of tags whose purpose is that Webmasters could mark-up their own Websites with microdata. Microdata is used to help search engines and other Website tools to better understand the information contained in the Websites. LOV is a catalogue to register all the vocabularies used by the datasets from the Web of Linked Data. Its aim is to provide an easy access to the vocabularies.

    In this research, we are reporting a study on the mechanisms that may enhance data retrieval from the Web of Linked Data using the previous resources and ontology matching techniques. These techniques aim to map terms from two different sources and obtain which of them are common to both sources. In our case, first we are mapping Schema.org with LOV, and then LOV with the Web of Linked Data.

    A network analysis of LOV has also been reported. The aim of this analysis is to obtain a quantitative and structural insight of LOV. Knowing this we can conclude which are the most popular vocabularies or if they are specialized in a particular field. This can be used to filter datasets or reuse information.

    The findings show different issues. In the case of the structure of the Web of Linked Data, it is concluded that is compact and the distance between nodes is low. Also, it has been checked that it follows the bow-tie theory and the most important datasets are WordNet 2.0 and DBpedia. Taking into account the analysis made in LOV, the following conclusions have been extracted. The vocabularies are not specialized in a particular field and there is no dominant scope. Also, the most popular vocabularies correspond to standards of the Semantic Web or that used to model other vocabularies like RDF, OWL (Web Ontology Language) or SKOS (Simple Knowledge Organization System). Finally, with the mappings between Schema.org, LOV and the Web of Linked Data, we have developed two use cases in data retrieval. The first let the users enrich Websites with information obtained from the datasets of LOD. The other use case consists of extending ontologies with new classes and properties of Schema.org.

    Another independent use case presented in this research as additional contribution consist of retrieving information from Google Scholar and aggregating it to sources that storage scientific knowledge like VIVO and CERIF.


Fundación Dialnet

Mi Documat