Ir al contenido

Documat


Exact and heuristic methods for statistical tabular data protection

  • Autores: Daniel Baena Mirabete
  • Directores de la Tesis: Jordi Castro Pérez (dir. tes.) Árbol académico
  • Lectura: En la Universitat Politècnica de Catalunya (UPC) ( España ) en 2017
  • Idioma: español
  • Tribunal Calificador de la Tesis: Laureano Fernando Escudero Bueno (presid.) Árbol académico, Francisco Javier Heredia Cervera (secret.) Árbol académico, Vicenç Torra Reventós (voc.) Árbol académico
  • Enlaces
    • Tesis en acceso abierto en: TDX
  • Resumen
    • One of the main purposes of National Statistical Agencies (NSAs) is to provide citizens or researchers with a large amount of trustful and high quality statistical information. NSAs must guarantee that no confidential individual information can be obtained from the released statistical outputs. The discipline of Statistical disclosure control (SDC) aims to avoid that confidential information is derived from data released while, at the same time, maintaining as much as possible the data utility. NSAs work with two types of data: microdata and tabular data. Microdata files contain records of individuals or respondents (persons or enterprises) with attributes. For instance, a national census might collect attributes such as age, address, salary, etc. Tabular data contains aggregated information obtained by crossing one or more categorical variables from those microdata files. Several SDC methods are available to avoid that no confidential individual information can be obtained from the released microdata or tabular data. This thesis focus on tabular data protection, although the research carried out can be applied to other classes of problems. Controlled Tabular Adjustment(CTA) and Cell Suppression Problem (CSP) have concentrated most of the recent research in the tabular data protection field. Both methods formulate Mixed Integer Linear Programming problems (MILPs) which are challenging for tables of moderate size. Even finding a feasible initial solution may be a challenging task for large instances. Due to the fact that many end users give priority to fast executions and are thus satisfied, in practice, with suboptimal solutions, as a first result of this thesis we present an improvement of a known and successful heuristic for finding feasible solutions of MILPs, called feasibility pump. The new approach, based on the computation of analytic centers, is named the Analytic Center Feasbility Pump.The second contribution consists in the application of the fix-and-relax heuristic (FR) to the CTA method. FR (alone or in combination with other heuristics) is shown to be competitive compared to CPLEX branch-and-cut in terms of quickly finding either a feasible solution or a good upper bound. The last contribution of this thesis deals with general Benders decomposition, which is improved with the application of stabilization techniques. A stabilized Benders decomposition is presented,which focus on finding new solutions in the neighborhood of "good'' points. This approach is efficiently applied to the solution of realistic and real-world CSP instances, outperforming alternative approaches.The first two contributions are already published in indexed journals (Operations Research Letters and Computers and Operations Research). The third contribution is a working paper to be submitted soon.


Fundación Dialnet

Mi Documat

Opciones de tesis

Opciones de compartir

Opciones de entorno