Ir al contenido

Documat


Automatic error localisation for categorical, continuous and integer data

  • Autores: Ton de Waal
  • Localización: Sort: Statistics and Operations Research Transactions, ISSN 1696-2281, Vol. 29, Nº. 1, 2005, págs. 57-99
  • Idioma: inglés
  • Títulos paralelos:
    • Localización automática de errores para datos categóricos, continuos y enteros.
  • Enlaces
  • Resumen
    • Data collected by statistical offices generally contain errors, which have to be corrected before reliable data can be published. This correction process is referred to as statistical data editing. At statistical offices, certain rules, so-called edits, are often used during the editing process to determine whether a record is consistent or not. Inconsistent records are considered to contain errors, while consistent records are considered error-free. In this article we focus on automatic error localisation based on the Fellegi-Holt paradigm, which says that the data should be made to satisfy all edits by changing the fewest possible number of fields. Adoption of this paradigm leads to a mathematical optimisation problem. We propose an algorithm to solve this optimisation problem for a mix of categorical, continuous and integer-valued data. We also propose a heuristic procedure based on the exact algorithm. For five realistic data sets involving only integer-valued variables we evaluate the performance of this heuristic procedure.

  • Referencias bibliográficas
    • Barcaroli, G., C. Ceccarelli, O. Luzi, A. Manzari, E. Riccini and F. Silvestri (1995). The methodology of editing and imputation of qualitative...
    • Boskovitz, A., R. Goré and M. Hegland (2003). A logical formalisation of the Fellegi-Holt method of data cleaning. Report, Research School...
    • Bruni, R., A. Reale, and R. Torelli (2001). Optimization techniques for edit validation and data imputation. Proceedings of Statistics Canada...
    • Bruni, R. and A. Sassano (2001). Logic and optimization techniques for an error free data collecting. Report, University of Rome “La Sapienza”.
    • Chambers, R. (2004). Methods Investigated in the EUREDIT Project. In: Methods and Experimental Results from the EUREDIT Project, J.R.H. Charlton...
    • Central Statistical Office (2000). Editing and calibration in survey processing. Report SMD-37, Ireland.
    • Chvátal, V. (1983). Linear Programming. W.H. Freeman and Company: New York.
    • Dantzig, G.B. and B. Curtis Eaves (1973). Fourier-Motzkin elimination and its dual. Journal of Combinatorial Theory (A) 14, 288-297.
    • De Jong, A. (2002). Uni-Edit: standardized processing of structural business statistics in the Netherlands. UN/ECE Work Session on Statistical...
    • De Waal, T. (1996). CherryPi: a computer program for automatic edit and imputation. UN/ECE Work Session on Statistical Data Editing, Voorburg.
    • De Waal, T. (2001). SLICE: generalised software for statistical data editing. In Proceedings in Computational Statistics, J.G. Bethlehem and...
    • De Waal, T. (2003a). Processing of Erroneous and Unsafe Data. Ph.D. Thesis, Erasmus University, Rotterdam
    • De Waal, T. (2003b). Solving the error localization problem by means of vertex generation. Survey Methodology, 29, 71-79.
    • De Waal, T. and W. Coutinho (2005). Automatic editing for business surveys: an assessment for selected algorithms. International Statistical...
    • De Waal, T. and R. Quere (2003). A fast and simple algorithm for automatic editing of mixed data. Journal of Official Statistics, 19, 383-402.
    • Duffin, R.J. (1974). On Fourier’s analysis of linear inequality systems. Mathematical Programming Studies, 1, 71-95.
    • Fellegi, I.P. and D. Holt (1976). A systematic approach to automatic edit and imputation. Journal of the American Statistical Association,...
    • Garfinkel, R.S., A.S. Kunnathur and G.E. Liepins (1986). Optimal imputation of erroneous data: categorical data, general edits. Operations...
    • Garfinkel, R.S., A.S. Kunnathur and G.E. Liepins (1988). Error localization for erroneous data: continuous data, linear constraints. SIAM...
    • Granquist, L. (1990). A review of some macro-editing methods for rationalizing the editing process. Proceedings of the Statistics Canada Symposium,...
    • Granquist, L. (1995). Improving the traditional editing process. In Business Survey Methods, Cox, Binder, Chinnappa, Christianson & Kott...
    • Granquist, L. (1997). The new view on editing. International Statistical Review, 65, 381-387.
    • Granquist, L. and J. Kovar (1997). Editing of survey data: how much is enough?. In Survey Measurement and Process Quality, Lyberg, Biemer,...
    • Hedlin, D. (2003). Score functions to reduce business survey editing at the U.K. office for national statistics. Journal of Official Statistics,...
    • Hoogland, J. (2002). Selective editing by means of plausibility indicators. UN/ECE Work Session on Statistical Data Editing, Helsinki.
    • Hoogland, J. and E. Van der Pijll (2003). Evaluation of automatic versus manual editing of production statistics 2000 trade & transport....
    • ILOG CPLEX 7.5 Reference Manual (2001). ILOG, France.
    • Kalton, G. and D. Kasprzyk (1986). The treatment of missing survey data. Survey Methodology, 12, 1-16.
    • Kovar, J. and P. Whitridge (1990). Generalized edit and imputation system: overview and applications. Revista Brasileira de Estadistica, 51,...
    • Kovar, J. and P. Whitridge (1995). Imputation of business survey data. In Business Survey Methods, Cox, Binder, Chinnappa, Christianson &...
    • Liepins, G.E., R.S. Garfinkel and A.S. Kunnathur (1982). Error localization for erroneous data: A survey. TIMS/Studies in the Management Sciences,...
    • McKeown, P.G. (1984). A mathematical programming approach to editing of continuous survey data. SIAM Journal on Scientific and Statistical...
    • Nemhauser, G.L. and L.A. Wolsey (1988). Integer and Combinatorial Optimisation. John Wiley & Sons, New York.
    • Pannekoek, J. and T. De Waal (2005). Automatic editing and imputation for business surveys: the dutch contribution to the EUREDIT project....
    • Pugh, W. (1992). The Omega test: a fast and practical integer programming algorithm for data dependence analysis. Communications of the ACM...
    • Pugh, W. and D. Wonnacott (1994). Experiences with constraint-based array dependence analysis. In Principles and Practice of Constraint Programming,...
    • Ragsdale, C.T. and P.G. McKeown (1996). On solving the continuous data editing problem. Computers & Operations Research, 23, 263-273.
    • Riera-Ledesma, J. and J.J. Salazar-González (2003). New algorithms for the editing and imputation problem. UN/ECE Work Session on Statistical...
    • Sande, G. (1978). An algorithm for the fields to impute problems of numerical and coded data. Technical report, Statistics Canada.
    • Schaffer, J. (1987). Procedure for solving the data-editing problem with both continuous and discrete data types. Naval Research Logistics,...
    • Schrijver, A. (1986). Theory of Linear and Integer Programming. New York: John Wiley & Sons.
    • Stoop, J.R. (2003). The best piece of CherryPie (in Dutch). Internal report (BPA number: 2098-03-TMO), Voorburg: Statistics Netherlands.
    • Todaro, T.A. (1999). Overview and evaluation of the AGGIES automated edit and imputation system. UN/ECE Work Session on Statistical Data Editing,...
    • Williams, H.P. (1976). Fourier-Motzkin elimination extension to integer programming. Journal of Combinatorial Theory (A), 21, 118-123.
    • Williams, H.P. (1986). Fourier’s method of linear programming and its dual. American Mathematical Monthly, 93, 681-695.
    • Winkler, W.E. (1996). State of statistical data editing and current research problems. UN/ECE Work Session on Statistical Data Editing, Rome.
    • Winkler, W.E. (1998). Set-covering and editing discrete data. Statistical Research Division Report 98/01, US Bureau of the Census, Washington,...
    • Winkler, W.E. and L.A. Draper (1997). The SPEER edit system. Statistical Data Editing (Volume 2); Methods and Techniques, United Nations,...
    • Winkler, W.E. and T.F. Petkunas (1997). The DISCRETE edit system. Statistical Data Editing (Volume 2); Methods and Techniques. United Nations,...

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno