Ir al contenido

Documat


Efficient Distance Join Query Processing in Distributed Spatial Data Management Systems

  • Francisco García-García [1] ; Antonio Corral [1] ; Luis Iribarne [1] ; Michael Vassilakopoulos ; Yannis Manolopoulos
    1. [1] Universidad de Almeria
  • Localización: Actas de las XXV Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2021): [Málaga, 22 al 24 de septiembre de 2021] / coord. por Rafael Capilla Sevilla Árbol académico, Maider Azanza Sese Árbol académico, Miguel Rodríguez Luaces Árbol académico, M. M. Roldán García Árbol académico, Dolores Burgueño Caballero, José Raúl Romero Salguero Árbol académico, José Antonio Parejo Maestre Árbol académico, José Francisco Chicano García Árbol académico, Marcela Genero Árbol académico, Óscar Díaz García Árbol académico, José González Enríquez Árbol académico, María Carmen Penades Gramage Árbol académico; Silvia Mara Abrahao Gonzales (col.) Árbol académico, 2021
  • Idioma: inglés
  • Texto completo no disponible (Saber más ...)
  • Resumen
    • Due to the ubiquitous use of spatial data applications and the large amounts of such data these applications use, the processing of large-scale distance joins in distributed systems is becoming increasingly popular. Distance Join Queries(DJQs) are important and frequently used operations in numerous applications, including data mining, multi-media and spatial databases. DJQs (e.g., k Nearest Neighbor Join Query, k Closest Pair Query, +A7U Distance Join Query, etc.) are costly operations, since they involve both the join and distance-based search, and performing DJQs efficiently is a challenging task. Recent Big Data developments have motivated the emergence of novel technologies for distributed processing of large-scale spatial data in clusters of computers, leading to Distributed Spatial Data Management Systems(DSDMSs). Distributed cluster-based computing systems can be classified as Hadoop-based or Spark-based systems. Based on this classification, in this paper, we compare two of the most recent and leading DSDMSs, SpatialHadoop and LocationSpark, by evaluating the performance of several existing and newly proposed parallel and distributed DJQ algorithms under various settings with large spatial real-world datasets. A general conclusion arising from the execution of the distributed DJQ algorithms studied is that, while SpatialHadoop is a robust and efficient system when large spatial datasets are joined (since it is built on top of the mature Hadoop platform), LocationSpark is the clear winner in total execution time efficiency when medium spatial datasets are combined (due to in-memory processing provided by Spark). However, LocationSpark requires higher memory allocation when large spatial datasets are involved in DJQs (even more so when k and +A7U are large). Finally, this detailed performance study has demonstrated that the new distributed DJQ algorithms we have pro-posed are efficient, robust and scalable with respect to different parameters, such as dataset sizes, k, +A7U and number of computing nodes.


Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno