Ir al contenido

Documat


Resumen de Big data optimization: algorithmic framework for data analysis guided by semantics

Cristóbal Barba González

  • Over the past decade the rapid rise of creating data in all domains of knowledge such as traffic, medicine, social network, industry, etc., has highlighted the need for enhancing the process of analyzing large data volumes, in order to be able to manage them with more easiness and in addition, discover new relationships which are hidden in them. Big Data is the approach when an extremely large data volume are analyzed. One of the main reasons why Big Data has emerged, is because classical algorithms are not able to manage this huge amount of data due to they were not designed for this purpose.

    Optimization problems, which are commonly found in current industry, are not unrelated to this trend, therefore Multi-Objective Optimization Algorithms (MOA) should bear in mind this new scenario. This means that, MOAs have to deal with problems, which have either various data sources (typically streaming) of huge amount of data. Indeed these features, in particular, are found in Dynamic Multi-Objective Problems (DMOPs), which are related to Big Data optimization problems. Mostly with regards to velocity and variability. When dealing with DMOPs, whenever there exist changes in the environment that affect the solutions of the problem (i.e., the Pareto set, the Pareto front, or both), therefore in the fitness landscape, the optimization algorithm must react to adapt the search to the new features of the problem. This means that dynamic multi- objective optimization metaheuristic ought to be able to detect when the problem changes and to apply a strategy to cope with the changes, meaning they have to be interactive with the context.

    Big Data analytics are long and complex processes therefore, with the aim of simplify them, a series of steps are carried out through. A typical analysis is composed of data collection, data manipulation, data analysis and finally result visualization.

    In this sense, analytic workflows can be seems as a network of service operations connected together by data links describing how the outputs of some operations are to be fed into the inputs of others. Consequently, workflows are useful when a process is made up of a row of tasks more or less complex.

    In the process of creating a Big Data workflow the analyst should bear in mind the semantics involving the problem domain knowledge and its data. What is more, the semantic of the algorithm, which is used to resolve the problem, is a key issue when deciding with kind of the algorithm is able to tackle with the characteristics of the problem. To this end, ontology is the standard way for describing the knowledge about a domain.

    As a global target of this PhD Thesis, we are interested in investigating the use of the semantic in the process of Big Data analysis, not only focused on machine learning analysis, but also in optimization.


Fundación Dialnet

Mi Documat