Ir al contenido

Documat


Resumen de Semantics in Big Data Analytics

Antonio Benítez Hidalgo

  • In this thesis we address challenges and solutions for managing and processing large-scale datasets in semantic environments.

    Through the development of the TITAN platform, we aim to provide a tool for managing the lifecycle of workflows, integrating semantics to facilitate more intelligent and efficient workflows. TITAN was built with a flexible architecture, allowing for the implementation of new functionalities.

    In this regard, we developed NORA, a tool designed to provide reasoning over large ontologies. Using NORA with TITAN, efficient and scalable reasoning can be performed on semantically rich workflows, leveraging NoSQL database technologies to ensure scalability and reliability. NORA uses Apache Spark as its computational engine to implement inference rules, allowing the reasoning process to be evaluated iteratively until no new inferred knowledge is derived.

    In the biological domain, we introduce SALON, an ontology that provides a consistent understanding and use of multiple sequence alignments. SALON eases the development of Linked Data repositories to offer uniform access to diverse information essential for bioinformatics researchers. This ontology can also serve as a mediator schema for integrating data from various sources and validating sequence alignments by defining SWRL rules.

    Furthermore, we explore a methodology to inject semantic knowledge (expressed via ontologies) into analysis algorithms using the META ontology. This ontology allows algorithms to be enriched with domain-specific information, resulting in more informed and accurate decisions. Several use cases demonstrate META's effectiveness in enhancing the analysis process, including its use for mapping domain knowledge and constraints into machine learning models. Through META, algorithms can be guided by expert knowledge and domain-specific considerations.

    Lastly, we identify several promising directions for future work. These include enhancing the semantic capabilities of TITAN, extending NORA's functionalities, and developing intuitive interfaces for META to make semantics in Big Data more accessible and efficient for a broad range of users.


Fundación Dialnet

Mi Documat