Knowledge-Graph-Based Semantic Labeling of Tabular Data

Alobaid, Ahmad ORCID: https://orcid.org/0000-0001-8637-6313 (2020). Knowledge-Graph-Based Semantic Labeling of Tabular Data. Tesis (Doctoral), E.T.S. de Ingenieros Informáticos (UPM). https://doi.org/10.20868/UPM.thesis.64068.

Descripción

Título: Knowledge-Graph-Based Semantic Labeling of Tabular Data
Autor/es:
Director/es:
Tipo de Documento: Tesis (Doctoral)
Fecha de lectura: Febrero 2020
Materias:
Escuela: E.T.S. de Ingenieros Informáticos (UPM)
Departamento: Inteligencia Artificial
Licencias Creative Commons: Reconocimiento - Sin obra derivada - No comercial

Texto completo

[thumbnail of AHMAD_ADEL_ALOBAID.pdf]
Vista Previa
PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (1MB) | Vista Previa

Resumen

A lot of data are published on the Web using tabular data formats (e.g., spreadsheets). This is especially the case for the data made available in open data portals by public and private institutions. However, one of the main challenges for their effective (re)use is their generalized lack of semantics: column names are not usually standardized, and their meaning and content are not always clear. In parallel, knowledge graphs have started to be widely adopted by some data providers as a means to publish large amounts of structured data. They commonly use graph-based formats (e.g., RDF) and make references to lightweight ontologies. It is well understood that the reuse of such tabular data may be improved by annotating them with the classes and properties used by the data available in knowledge graphs. Several challenges exist in performing semantic labeling, such as the commonality or duplication of entity names, the difference in measurements and rounding errors of numeric values, and the noise in published tabular data and knowledge graphs. In this work, we present a novel approach to automatically label columns in tabular data with ontology classes and properties referred to by existing knowledge graphs. We evaluated the performance of our approach on entity columns and numeric columns separately. For the entity columns, we applied our approach to annotated tables from the T2D gold standard. For the numeric columns, we manually annotated numeric columns in the T2D gold standard and then applied our technique to this data. We report the performance of our approach using precision, recall, and F1 scores, which is the conventional way to report the performance of semantic labeling in the literature. The experiments showed that our proposed approach successfully labeled the majority of the entity and numeric columns in the used dataset. In contrast with other existing proposals in the state-of-the-art, our approach does not require the use of external linguistic resources, other sources of information, or human in the loop.

Más información

ID de Registro: 64068
Identificador DC: https://oa.upm.es/64068/
Identificador OAI: oai:oa.upm.es:64068
Identificador DOI: 10.20868/UPM.thesis.64068
Depositado por: Archivo Digital UPM 2
Depositado el: 28 Sep 2020 06:15
Ultima Modificación: 24 Mar 2021 23:30
  • Logo InvestigaM (UPM)
  • Logo Sherpa/Romeo
    Compruebe si la revista anglosajona en la que ha publicado un artículo permite también su publicación en abierto.
  • Logo Dulcinea
    Compruebe si la revista española en la que ha publicado un artículo permite también su publicación en abierto.
  • Logo del Portal Científico UPM
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo de Recolecta
  • Logo de OpenCourseWare UPM