Ir al contenido

Documat


Distributed computing solutions for high energy physics interactive data analysis

  • Autores: Vincenzo Eduardo Padulano
  • Directores de la Tesis: Enric Tejedor Saavedra (dir. tes.) Árbol académico, Pedro Alonso Jordá (dir. tes.) Árbol académico
  • Lectura: En la Universitat Politècnica de València ( España ) en 2023
  • Idioma: español
  • Tribunal Calificador de la Tesis: Ignacio Blanquer Espert (presid.) Árbol académico, Davide Valsecchi (secret.) Árbol académico, Roel Aaij (voc.) Árbol académico
  • Enlaces
    • Tesis en acceso abierto en: RiuNet
  • Resumen
    • The scientific research in High Energy Physics (HEP) is characterised by complex computational challenges, which over the decades had to be addressed by researching computing techniques in parallel to the advances in understanding physics. One of the main actors in the field, CERN, hosts both the Large Hadron Collider (LHC) and thousands of researchers yearly who are devoted to collecting and processing the huge amounts of data generated by the particle accelerator. This has historically provided a fertile ground for distributed computing techniques, which led to the creation of the Worldwide LHC Computing Grid (WLCG), a global network providing large computing power for all the experiments revolving around the LHC and the HEP field. Data generated by the LHC so far has already posed challenges for computing and storage. This is only going to increase with future hardware updates of the accelerator, which will bring a scenario that will require large amounts of coordinated resources to run the workflows of HEP analyses. The main strategy for such complex computations is, still to this day, submitting applications to batch queueing systems connected to the grid and wait for the final result to arrive. This has two great disadvantages from the user's perspective: no interactivity and unknown waiting times. In more recent years, other fields of research and industry have developed new techniques to address the task of analysing the ever increasing large amounts of human-generated data (a trend commonly mentioned as "Big Data"). Thus, new programming interfaces and models have arised that most often showcase interactivity as one key feature while also allowing the usage of large computational resources.

      In light of the scenario described above, this thesis aims at leveraging cutting-edge industry tools and architectures to speed up analysis workflows in High Energy Physics, while providing a programming interface that enables automatic parallelisation, both on a single machine and on a set of distributed resources. It focuses on modern programming models and on how to make best use of the available hardware resources while providing a seamless user experience. The thesis also proposes a modern distributed computing solution to the HEP data analysis, making use of the established software framework called ROOT and in particular of its data analysis layer implemented with the RDataFrame class. A few key research areas that revolved around this proposal are explored. From the user's point of view, this is detailed in the form of a new interface to data analysis that is able to run on a laptop or on thousands of computing nodes, with no change in the user application. This development opens the door to exploiting distributed resources via industry standard execution engines that can scale to multiple nodes on HPC or HTC clusters, or even on serverless offerings of commercial clouds. Since data analysis in this field is often I/O bound, a good comprehension of what are the possible caching mechanisms is needed. In this regard, a novel storage system based on object store technology was researched as a target for caching.

      In conclusion, the future of data analysis in High Energy Physics presents challenges from various perspectives, from the exploitation of distributed computing and storage resources to the design of ergonomic user interfaces. Software frameworks should aim at efficiency and ease of use, decoupling as much as possible the definition of the physics computations from the implementation details of their execution. This thesis is framed in the collective effort of the HEP community towards these goals, defining problems and possible solutions that can be adopted by future researchers.


Fundación Dialnet

Mi Documat

Opciones de tesis

Opciones de compartir

Opciones de entorno