Ir al contenido

Documat


Resumen de Ontology based semantic anonymisation of microdata

S. Martínez Luis

  • The exploitation of microdata compiled by statistical agencies is of great interest for the data mining community. However, such data often include sensitive information that can be directly or indirectly related to individuals. Hence, an appropriate anonymisation process is needed to minimise the risk of disclosing identities and/or confidential data. In the past, many anonymisation methods have been developed to deal with numerical data, but approaches tackling the anonymisation of non-numerical values (e.g. categorical, textual) are scarce and shallow. Since the utility of this kind of information is closely related to the preservation of its meaning, in this work, the notion of semantic similarity is used to enable a semantically coherent interpretation. Ontologies are the basic pillar to propose a semantic framework that enables the management and transformation of categorical attributes, defining different operators that take into account the underlying semantics of the data values. The application of the operators defined in this semantic framework to the data anonymisation task allows the development of three anonymisation methods especially tailored to categorical attributes: Semantic Recoding, Semantic and Adaptive Microaggregation and Semantic Resampling. In addition a new Semantic Record linkage method is proposed, which considers data semantics in order to more accurately evaluate the disclosure risk of anonymised non-numerical data. The proposed methods have been extensively evaluated with real datasets with encouraging results. Experimental results show that a semantic-based treatment of categorical attributes significantly improves the semantic interpretability and utility of the anonymised data.


Fundación Dialnet

Mi Documat