Ir al contenido

Documat


Data science and big data processing in R: representations and software

  • Autores: Lala Septem Riza
  • Directores de la Tesis: Francisco Herrera Triguero (dir. tes.) Árbol académico, José Manuel Benítez Sánchez (dir. tes.) Árbol académico
  • Lectura: En la Universidad de Granada ( España ) en 2015
  • Idioma: inglés
  • Tribunal Calificador de la Tesis: Antonio González Muñoz (presid.) Árbol académico, Manuel Gómez Olmedo (secret.) Árbol académico, Luciano Sánchez Ramos (voc.) Árbol académico, Antonio Peregrín Rubio (voc.) Árbol académico, Matías Gámez Martínez (voc.) Árbol académico
  • Enlaces
    • Tesis en acceso abierto en: DIGIBUG
  • Resumen
    • The main objective of this thesis is the development of high quality and easy to use software modules for represent, create and manage system models and data analysis. Since it has become a de facto standard, R is the platform of choice. The mentioned packages consider the techniques based on fuzzy systems, rough sets, and fuzzy rough sets. In addition, a universal representation framework for fuzzy rule-based systems is introduced. Finally, the implementation of random forests and random ferns for tackling Big Data is discussed. According to these objectives, the following are results of the research: 1. The "frbs" package: It is an R package implementing the most relevant types of fuzzy rule-based systems along with a selection of machine-learning algorithms to build them. The package focuses on classification and regression tasks. It also includes a mechanism to allow the construction of a model by human experts. It is available in CRAN: http://cran.r-project.org/package=frbs and in the project website: http://sci2s.ugr.es/dicits/software/FRBS.

      2. The "RoughSets" package: It is an R package implementing algorithms based on rough set theory and fuzzy rough set theory for knowledge representation and data analysis. In includes tools for managing missing values, discretization, feature selection, and instance selection, for both classification and regression tasks. It is available in CRAN: http://cran.r-project.org/package=RoughSets and in the project website: http://sci2s.ugr.es/dicits/software/RoughSets.

      3. frbsPMML: It is a universal representation framework for fuzzy rule based systems based on the Predictive Model Markup Language. Furthermore, two software libraries to manage the representation are implemented: an extension of the "frbs"package and the Java package "frbsJpmml".

      4. The "SparkFernTreeR" package: It is an R package implementing random forests and random ferns for dealing with Big Data processing. This package is developed on top of the Big Data frameworks: Apache Hadoop and Apache Spark.


Fundación Dialnet

Mi Documat

Opciones de tesis

Opciones de compartir

Opciones de entorno