Uso de GPUs en aplicaciones de tiempo real: Una revisión de técnicas para el análisis y optimización de parámetros temporales

Iosu Gomez; Unai Díaz de Cerio; Jorge Parra; Juan María Rivas Concepción; José Javier Gutiérrez García

Ayuda

Uso de GPUs en aplicaciones de tiempo real: Una revisión de técnicas para el análisis y optimización de parámetros temporales

Gomez, Iosu ^[2] ; Díaz de Cerio, Unai ^[2] ; Parra, Jorge ^[2] ; Rivas, Juan M. ^[1] ; Gutiérrez, J. Javier ^[1]
1. [1] Universidad de Cantabria
  
  Universidad de Cantabria
  
  Santander, España
2. [2] Ikerlan
Localización: Revista iberoamericana de automática e informática industrial ( RIAI ), ISSN-e 1697-7920, Vol. 21, Nº. 1, 2024, págs. 1-16
Idioma: español
DOI: 10.4995/riai.2023.20321
Títulos paralelos:
- Using GPUs in Real-Time Applications: A Review of Techniques for Analyzing and Optimizing the Timing Parameters
Enlaces
- Texto completo
Resumen
- español
  La conducción autónoma despierta un interés cada vez mayor en la industria, no solo en el sector de la automoción, sino también en el transporte de personas o mercancías por carretera o ferrocarril y en entornos de fabricación más controlados. Los sistemas ciber-físicos que se están proponiendo para este tipo de aplicaciones requieren de una gran capacidad de cómputo (arquitecturas hardware con varios núcleos, GPUs, NPUs…) para poder atender y reaccionar a una múltiple y compleja cantidad de sensores (cámaras, radar, LiDAR, medida de distancia, etc.). Por otro lado, este tipo de sistemas debe atender a requisitos de seguridad funcional y también de tiempo real. Este último aspecto plantea retos en los que se está trabajando intensamente y en los que aún quedan muchas cuestiones por resolver. En este trabajo, se hace una revisión de la literatura más reciente del uso de arquitecturas heterogéneas con GPUs en aplicaciones de tiempo real. Estos trabajos proponen soluciones para la estimación de cotas de tiempos de ejecución y respuesta temporal, proponiendo diferentes estrategias de optimización destacando la mitigación de interferencia en la memoria.
- English
  Autonomous driving is attracting an increasing attention in industry, not only in the automotive sector, but also in the transport of people or goods by road or railway and in more controlled manufacturing environments. The cyber-physical systems that are being proposed for this type of applications require a large computing capacity (hardware architectures with several cores, GPUs, NPUs...) to be able to attend and react to a multiple and complex amount of sensors (cameras, radar, LiDAR, measure of distance, etc.). On the other hand, this type of system must meet both safety and real-time requirements. This last aspect poses challenges on which intensive work is being done and on which there are still many open issues. In this work, a review of the most recent literature on the use of heterogeneous architectures with GPUs in real-time applications is made. These works mainly propose some solutions to the estimation of bounds to the execution times and response times, and consider different optimization strategies emphasising memory interference mitigation.
Referencias bibliográficas
- Abeni, L., Buttazzo, G., Superiore, S., Anna, S., 1998. Integrating multimedia applications in hard real-time systems. Real-Time Systems Symposium....
- Aghilinasab, H., Ali,W., Yun, H., Pellizzoni, R., 2020. Dynamic memory bandwidth allocation for real-time gpu-based soc platforms. IEEE Transactions...
- Ali, W., Yun, H., 2018. Protecting real-time gpu kernels on integrated cpu-gpu soc platforms. Leibniz International Proceedings in Informatics,...
- Amert, T., Otterness, N., Yang, M., Anderson, J.H., Smith, F.D., 2018. Gpu scheduling on the nvidia tx2: Hidden details revealed. Real-Time...
- Andreozzi, M., Gabrielli, G., Venu, B., Travaglini, G., 2022. Industrial challenge 2022: A high-performance real-time case study on arm. Leibniz...
- Ayala-Barbosa, J.A., Mendez-Monroy, P.E., 2022. A new preemptive task scheduling framework for heterogeneous embedded systems. ACM International...
- Baek, I., Harding, M., Kanda, A., Choi, K.R., Samii, S., Rajkumar, R.R., 2020. Carss: Client-aware resource sharing and scheduling for heterogeneous...
- Bakita, J., Anderson, J.H., 2023. Hardware compute partitioning on nvidia gpus*. IEEE Real Time Technology and Applications Symposium (RTAS)...
- Basaran, C., Kang, K.D., 2012. Supporting preemptive task executions and memory copies in gpgpus. 2012 24th Euromicro Conference on Real-Time...
- Bateni, S., Wang, Z., Zhu, Y., Hu, Y., Liu, C., 2020. Co-optimizing performance and memory footprint via integrated cpu/gpu memory management,...
- Bechtel, M., Yun, H., 2023. Analysis and mitigation of shared resource contention on heterogeneous multicore: An industrial case study. doi:arXiv:2304.13110.
- Boniol, F., Mohan, S., 2022. IEEE RTSS 2022 industry challenge. URL: http://2022.rtss.org/industry-session
- Calderón, A.J., Kosmidis, L., Nicolas, C.F., Cazorla, F.J., Onaindia, P., 2019. Understanding and exploiting the internals of gpu resource...
- Calderón, A.J., Kosmidis, L., Nicol'as, C.F., de Lasala, J., Larrañaga, I., 2021. Assessing and improving the suitability of model-based...
- Calderón, A.J., Torres, C., Kosmidis, L., Fernando, C., Ram'ırez, N., Javier, F., Almeida, C., 2022. Real-Time High-Performance Computing...
- Capodieci, N., Burgio, P., 2016. Efficient implementation of genetic algorithms on gp-gpu with scheduled persistent cuda threads. International...
- Capodieci, N., Burgio, P., Cavicchioli, R., Olmedo, I.S., Solieri, M., Bertogna, M., 2022. Real-time requirements for adas platforms featuring...
- Capodieci, N., Cavicchioli, R., Bertogna, M., Paramakuru, A., 2018. Deadlinebased scheduling for gpu with preemption support. Real-Time Systems...
- Capodieci, N., Cavicchioli, R., Valente, P., Bertogna, M., 2017. Sigamma: Server based integrated gpu arbitration mechanism for memory accesses....
- Casini, D., Biondi, A., 2022. Placement of chains of real-time tasks on heterogeneous platforms under edf scheduling. Proceedings - 2022 25th...
- Casini, D., Pazzaglia, P., Biondi, A., Natale, M.D., 2022. Optimized partitioning and priority assignment of real-time applications on heterogeneous...
- Cavicchioli, R., Capodieci, N., Bertogna, M., 2017. Memory interference characterization between cpucores and integrated gpus in mixed-criticality...
- Cavicchioli, R., Capodieci, N., Bertogna, M., 2020. Contending memory in heterogeneous socs:evolution in nvidia tegra embedded platforms....
- Chen, G., Zhao, Y., Shen, X., Zhou, H., 2017. Effisha: A software framework for enabling efficient preemptive scheduling of gpu. 22nd ACM...
- Cucinotta, T., Amory, A., Ara, G., Paladino, F., Natale, M.D., 2023. Multi-criteria optimization of real-time dags on heterogeneous platforms...
- Dasari, D., Akesson, B., N'elis, V., Awan, M.A., Petters, S.M., 2013. Identifying the sources of unpredictability in cots-based multicore...
- Diewald, A., Barner, S., Saidi, S., 2019. Combined data transfer response time and mapping exploration in mpsocs. Euromicro Conference on...
- Elliott, G.A., Ward, B.C., Anderson, J.H., 2013. Gpusync: A framework for real-time gpu management. Real-Time Systems Symposium , 33- 44....
- Fang, J., Wang, M., Wei, Z., 2020. A memory scheduling strategy for eliminating memory access interference in heterogeneous system. Journal...
- Fickenscher, J., Reinhart, S., Hannig, F., Teich, J., Bouzouraa, M.E., 2017. Convoy tracking for adas on embedded gpus. 2017 IEEE Intelligent...
- Forsberg, B., Benini, L., Marongiu, A., 2019. Taming data caches for predictable execution on gpu-based socs. 2019 Design, Automation &...
- Forsberg, B., Benini, L., Marongiu, A., 2021. Heprem: A predictable execution model for gpu-based heterogeneous socs. IEEE Transactions on...
- Forsberg, B., Marongiu, A., Benini, L., 2017a. Gpuguard: Towards supporting a predictable execution model for heterogeneous soc. Design, Automation...
- Forsberg, B., Palossi, D., Marongiu, A., Benini, L., 2017b. Gpu-accelerated real-time path planning and the predictable execution model. Procedia...
- Gupta, K., Stuart, J.A., Owens, J.D., 2012. A study of persistent threads style gpu programming for gpgpu workloads. 2012 Innovative Parallel...
- Hamann, A., Dasari, D., Wurst, F., Sa˜nudo, I., Capodieci, N., Burgio, P., 2019. Waters industrial challenge 2019 final.
- Hartmann, C., Margull, U., 2019. Gpuart - an application-based limited preemptive gpu real-time scheduler for embedded systems. Journal of...
- Houdek, P., Sojka, M., Hanzalek, Z., 2017. Towards predictable execution model on arm-based heterogeneous platforms. 2017 IEEE 26th International...
- Houssam-Eddine, Z., Capodieci, N., Cavicchioli, R., Lipari, G., Bertogna, M., 2021. The hpc-dag task model for heterogeneous real-time systems....
- Hötger, R., Ki, J., Bui, T.B., Igel, B., Spinczyk, O., 2019. Cpu-gpu response time and mapping analysis for high-performance automotive systems....
- Jain, S., Baek, I., Wang, S., Rajkumar, R., 2019. Fractional gpus: Softwarebased compute and memory bandwidth reservation for gpus. IEEE Real-Time...
- Janzèn, J., Black-Schaffer, D., Hugo, A., 2016. Partitioning gpus for improved scalability. 28th International Symposium on Computer Architecture...
- Kang, W., Lee, K., Lee, J., Shin, I., Chwa, H.S., 2021. Lalarand: Flexible layer-by-layer cpu/gpu scheduling for real-time dnn tasks. Proceedings...
- Kato, S., McThrow, M., Maltzahn, C., Brandt, S., 2012. Gdev: First-class gpu resource management in the operating system. 2012 USENIX Annual...
- Khronos, 2023. Opencl. URL: https://www.khronos.org/opencl. (Last accessed 2023).
- Kim, H., Patel, P., Wang, S., Rajkumar, R.R., 2018. A server-based approach for predictable gpu access with improved analysis. Journal of...
- Kim, H., Rajkumar, R., 2016. Real-time cache management for multi-core virtualization. 13th International Conference on Embedded Software,...
- Kim, S., Jung, C., Kim, Y., 2022. Comparative analysis of gpu stream processing between persistent and non-persistent kernels. 13th International...
- Kloda, T., Solieri, M., Mancuso, R., Capodieci, N., Valente, P., Bertogna, M., 2019. Deterministic memory hierarchy and virtualization for...
- Krawczyk, L., Wolff, C., Bazzal, M., Govindarajan, R.P., 2019. An analytical approach for calculating end-to-end response times in autonomous...
- Lee, H., Kim, H., Kim, C., Han, H., Seo, E., 2021. Idempotence-based preemptive gpu kernel scheduling for embedded systems. IEEE Transactions...
- Li, D., Aamodt, T.M., 2016. Inter-core locality aware memory scheduling. IEEE Computer Architecture Letters 15, 25-28. https://doi.org/10.1109/LCA.2015.2435709
- Li, R., Hu, T., Jiang, X., Li, L., Xing, W., Deng, Q., Guan, N., 2023. Rosgm: A real-time gpu management framework with plug-in policies for...
- Lim, Y., Kim, H., 2019. Cache-aware real-time virtualization for clustered multi-core platforms. IEEE Access 7, 128628-128640. https://doi.org/10.1109/ACCESS.2019.2939859
- Liu, L., Cui, Z., Xing, M., Bao, Y., Chen, M., Wu, C., 2012. A software memory partition approach for eliminating bank-level interference...
- Lugo, T., Lozano, S., Fernandez, J., Carretero, J., 2022. A survey of techniques for reducing interference in real-time applications on multicore...
- Lumpp, F., Patel, H.D., Bombieri, N., 2021. A framework for optimizing cpu-igpu communication on embedded platforms. 2021 58th ACM/ IEEE Design...
- Mancuso, R., Pellizzoni, R., Caccamo, M., Sha, L., Yun, H., 2015. Wcet(m) estimation in multi-core systems using single core equivalence....
- Marchi, M.D., Lumpp, F., Martini, E., Boldo, M., Aldegheri, S., Bombieri, N., 2021. Efficient ros-compliant cpu-igpu communication on embedded...
- Milluzzi, A., George, A., 2017. Exploration of tmr fault masking with persistent threads on tegra gpu socs. IEEE Aerospace Conference , 1-7....
- Nvidia, 2023. Cuda programming guide. URL: https://docs.nvidia.com. (Last accessed 2023).
- Olmedo, I.S., Capodieci, N., Cavicchioli, R., 2018. A perspective on safety and real-time issues for gpu accelerated adas. IECON 2018 - 44th...
- Olmedo, I.S., Capodieci, N., Martinez, J.L., Marongiu, A., Bertogna, M., 2020. Dissecting the cuda scheduling hierarchy: A performance and...
- Otterness, N., Miller, V., Yang, M., Anderson, J.H., Smith, F.D., Wang, S., 2016. Gpu sharing for image processing in embedded real-time systems...
- Otterness, N., Yang, M., Rust, S., Park, E., Anderson, J.H., Smith, F.D., Berg, A., Wang, S., 2017. An evaluation of the nvidia tx1 for supporting...
- Park, J., Yeom, H., Son, Y., 2020. Page reusability-based cache partitioning for multi-core systems. IEEE Transactions on Computers 69, 812-818....
- Pellizzoni, R., Betti, E., Bak, S., Yao, G., Criswell, J., Caccamo, M., Kegley, R., 2011. A predictable execution model for cots-based embedded...
- Perez-Cerrolaza, J., Abella, J., Kosmidis, L., Calderon, A.J., Cazorla, F., Flores, J.L., 2022. Gpu devices for safety-critical systems: A...
- Rehm, F., Dasari, D., Hamann, A., Pressler, M., Ziegenbein, D., Seitter, J., Sañudo, I., Capodieci, N., Burgio, P., Bertogna, M., 2021. Performance...
- Roeder, J., Rouxel, B., Grelck, C., 2021. Scheduling dags of multi-version multi-phase tasks on heterogeneous real-time systems. 2021 IEEE...
- Saha, S.K., Xiang, Y., Kim, H., 2019. Stgm: Spatio-temporal gpu management for real-time tasks. 2019 IEEE 25th International Conference on...
- Schuh, M., Maiza, C., Goossens, J., Raymond, P., Dinechin, B.D.D., 2020. A study of predictable execution models implementation for industrial...
- Serrano, M.A., Quiñones, E., 2018. Response-time analysis of dag tasks supporting heterogeneous computing. Design Automation & Test in...
- Singh, J., Olmedo, I.S., Capodieci, N., Marongiu, A., Caccamo, M., 2022. Reconciling qos and concurrency in nvidia gpus via warp-level scheduling....
- Spliet, R., Mullins, R.D., 2022. Sim-d: A simd accelerator for hard real-time systems. IEEE Transactions on Computers 71, 851-865. https://doi.org/10.1109/TC.2021.3064290
- Suzuki, N., Kim, H., Niz, D.D., Andersson, B., Wrage, L., Klein, M., Rajkumar, R., 2013. Coordinated bank and cache coloring for temporal...
- Suzuki, Y., Kato, S., Yamada, H., Kono, K., 2016. Gpuvm: Gpu virtualization at the hypervisor. IEEE Transactions on Computers 65, 2752-2766....
- Wu, B., Chen, G., Li, D., Shen, X., Vetter, J., 2015. Enabling and exploiting flexible task assignment on gpu through sm-centric program transformations....
- Xu, Y., Wang, R., Li, T., Song, M., Gao, L., Luan, Z., Qian, D., 2016. Scheduling tasks with mixed timing constraints in gpu-powered realtime...
- Yandrofski, T., Chen, J., Otterness, N., Anderson, J.H., Smith, F.D., 2022. Making powerful enemies on nvidia gpus. Real-Time Systems Symposium...
- Yang, M., Otterness, N., Amert, T., Bakita, J., Anderson, J.H., Smith, F.D., 2018. Avoiding pitfalls when using nvidia gpus for real-time...
- Yao, Y., Liu, S., Wu, S., Wang, J., Ni, J., Yang, G., Zhang, Y., 2022. Wamp2s: Workload-aware gpu performance model based pseudopreemptive...
- Ye, Y., West, R., Cheng, Z., Li, Y., 2014. Coloris: A dynamic cache partitioning system using page coloring. Parallel Architectures and Compilation...
- Yun, H., Ali, W., Gondi, S., Biswas, S., 2017. Bwlock: A dynamic memory access control framework for soft real-time applications on multicore...
- Yun, H., Mancuso, R.,Wu, Z.P., Pellizzoni, R., 2014. Palloc: Dram bank-aware memory allocator for performance isolation on multicore platforms....
- Yun, H., Yao, G., Pellizzoni, R., Caccamo, M., Sha, L., 2013. Memguard: Memory bandwidth reservation system for efficient performance isolation...
- Yurtsever, E., Lambert, J., Carballo, A., Takeda, K., 2020. A survey of autonomous driving: Common practices and emerging technologies. IEEE...
- Zhang, X., Dwarkadas, S., Shen, K., 2009. Towards practical page coloring-basedmulti-core cache management. 4th ACM European conference on...
- Zhou, H., Bateni, S., Liu, C., 2018. S3dnn: Supervised streaming and scheduling for gpu-accelerated real-time dnn workloads. IEEE Real-Time...
- Zou, A., Li, J., Gill, C.D., Zhang, X., 2023. Rtgpu: Real-time gpu scheduling of hard deadline parallel tasks with fine-grain utilization....