Ir al contenido

Documat


Resumen de Functional testing techniques for new massive data processing paradigms

Jesús Morán Barbón

  • Big Data programs are those that analyse the information using new processing models to overcome the limitations of the traditional technology due the volume, velocity or variety of the data. Among them, MapReduce stands out by allowing for the processing of large data over a distributed infrastructure that can change during runtime due the frequent infrastructure failures and optimizations. The developer only designs the program, whereas the execution of its functionality is managed by a distributed system, such as the allocation of the resources and the fault tolerance mechanism, among others. As a consequence, a program can behave differently at each execution because it is automatically adapted to the resources available at each moment. This non-deterministic execution makes both software testing and debugging difficult, specially for those MapReduce programs with complex design. Despite both performance and functionality are important, the majority of the research about the quality of the MapReduce programs are focused on performance. In contrast, few research studies are about functionality although several MapReduce applications fail regularly due a functional fault. Testing and debugging these faults are important, specially when the MapReduce programs perform a critical task.

    This thesis aims to both testing and debugging the MapReduce programs with new approaches to detect and understand those functional faults caused by the wrong design of the program. These design faults not only depend on the input data, but they may be triggered sometimes and masked other times because the execution is non-deterministic due a distributed infrastructure with automatic optimizations. To detect these faults, the thesis proposes a testing technique that executes each test case in different configurations and then checks that the executions generate always similar outputs. The technique generates the configurations with Random testing, and Partition testing together with Combinatorial testing to simulate the non-determinist executions that could happen in a production environment. This technique is also automated by using a test execution engine that is able to detect these faults using only the test input data, regardless of the expected output.

    Once the design faults are detected, the thesis proposes an automatic debugging framework to locate the root cause of the fault and isolate the data that trigger the failure. The root cause of the fault is automatically located through a spectrum-based technique that analyses statistically the common characteristic of the executions that trigger the fault against those characteristics that masked the fault. The data of the test case are reduced to improve the fault understanding through delta debugging and search-based techniques that iteratively reduce the data at the same time that the new data still trigger the failure. The debugging framework proposed in the thesis, also allows the inspection of the distributed execution through the common debugging utilities like breakpoints and watchpoints.

    The previous testing and debugging techniques can also be used in operations. This thesis proposes an autonomous approach to detect design faults in MapReduce programs executed in production environment. This approach uses the runtime data as test input data in order to detect the design faults in operations. These techniques are evaluated through controlled experiments using real-world MapReduce programs. The results show that the techniques proposed are able to test and debug the MapReduce programs automatically in few seconds. The testing technique detects the majority of the design faults of the MapReduce programs. Once the faults are detected, the fault localization technique usually locates the root cause of the design fault, and the reduction technique isolates the majority of the input data that triggers the failure to improve the fault understanding. In the reduction technique, the delta debugging technique reduces the data in a few seconds, in contrast the search-based approach is more time consuming but also reduces more data.

    As conclusions, the traditional testing techniques are not able to detect these design faults and the MapReduce applications must be tested with new approaches like those proposed in this thesis. Once the design fault is detected, the debugging techniques help to understand this fault, but the traditional debugging techniques are broadly focused on the failures caused by the code instead of those caused by a wrong design like in the MapReduce programs. From the point of view of functional design faults, the MapReduce applications must be both tested and debugged with new approaches like those proposed in this thesis.


Fundación Dialnet

Mi Documat