Ir al contenido

Documat


Automatic regrouping of strata in the goodness-of-fit chi-square test

  • Autores: Vicente Alfredo Núñez Antón Árbol académico, Juan Manuel Pérez-Salamero González, Marta Regúlez Castillo Árbol académico, Manuel Ventura, Carlos Vidal Meliá
  • Localización: Sort: Statistics and Operations Research Transactions, ISSN 1696-2281, Vol. 43, Nº. 1, 2019, págs. 113-142
  • Idioma: inglés
  • DOI: 10.2436/20.8080.02.83
  • Enlaces
  • Resumen
    • Pearson’s chi-square test is widely employed in social and health sciences to analyse categorical data and contingency tables. For the test to be valid, the sample size must be large enough to provide a minimum number of expected elements per category. This paper develops functions for regrouping strata automatically, thus enabling the goodness-of-fit test to be performed within an iterative procedure. The usefulness and performance of these functions is illustrated by means of a simulation study and the application to different datasets. Finally, the iterative use of the functions is applied to the Continuous Sample of Working Lives, a dataset that has been used in a considerable number of studies, especially on labour economics and the Spanish public pension system.

  • Referencias bibliográficas
    • Agresti, A. (2002). Categorical Data Analysis (2nd edition). Wiley, New York.
    • Bartholomew, D.J. and Tzamourani, P. (1999). The goodness-of-fit of latent trait models in attitude measurement. Sociological Methods and...
    • Bartholomew, D.J., Knott, M. and Moustaki, I. (2011). Latent Variable Models and Factor Analysis (3rd edition). Wiley, New York.
    • Bishop, Y.M.M., Fienberg, S.E. and Holland, P.W. (1975). Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge.
    • Bosgiraud, J. (2006). Sur le regroupement des classes dans le test du Khi-2. Revue Romaine de Mathématiques Pures et Appliquées, 51, 167–172.
    • Cai, L., Maydeu-Olivares, A., Coffman, D.L. and Thissen, D. (2006). Limited-information goodness-of-fit testing of item response theory models...
    • Campbell, I. (2007). Chi-squared and Fisher-Irwin tests of two-by-two tables with small sample recommendations. Statistics in Medicine, 26,...
    • Cochran, W.G. (1952). The χ2 test of goodness-of-fit. The Annals of Mathematical Statistics, 23, 315–345.
    • Collins, L.M., Fidler, P.L., Wugalter, S.E. and Long, J. (1993). Goodness-of-fit testing for latent class models. Multivariate Behavioral...
    • Delucchi, K.L. (1983). The use and misuse of chi-square: Lewis and Burke revisited. Psychological Bulletin, 94, 166–176.
    • DGOSS (2014). Muestra Continua de Vidas Laborales 2013. Secretarı́a de Estado de la Seguridad Social.
    • Dirección General de Ordenación (DGOSS). Ministerio de Trabajo e Inmigración. Madrid, Spain.
    • Fienberg, S.E. (2006). Log-linear models in contingency tables. In Encyclopedia of Statistical Sciences. Wiley, New York.
    • Fisher, R.A. (1935). The logic of inductive inference. Journal of the Royal Statistical Society, 98, 39–54.
    • Garcı́a Pérez, M.A. and Nuñez-Antón, V. (2009). Accuracy of power-divergence statistics for testing inde- pendence and homogeneity...
    • Goodman, L.A. (1974). Exploratory latent structures analysis using both identifiable and unidentifiable models. Biometrika, 61, 215–231.
    • Grafstörm, A. and Schelin, L. (2014). How to select representative samples. Scandinavian Journal of Statistics, 41, 277–290.
    • Haviland, M.G. (1990). Yates´ s correction for continuity and the analysis of 2× 2 contingency-tables. Statistics in Medicine, 9, 363–367.
    • Hirji, K.F. (2006). Exact Analysis of Discrete Data. Chapman and Hall, Boca Raton.
    • Hosmer, D.W., Hosmer, T., Le Cessie, S. and Lemeshow, S. (1997). A comparison of goodness-of-fit tests for the logistic regression model....
    • Hosmer, D.W. and Lemeshow, S. (2000). Applied Logistic Regression. Wiley, New York.
    • INSS (2014). Informe Estadı́stico 2013. Secretarı́a de Estado de Seguridad Social. Ministerio de Empleo y Seguridad Social, MESS. Madrid,...
    • Keeling, K.B. and Pavur, R.J. (2011). Statistical accuracy of spreadsheet software. The American Statistician, 65, 265–273.
    • Khan, H.A. (2003). A visual basic software for computing Fisher´s exact probability. Journal of Statistical Software, 8, 1–7.
    • Kroonenberg, P.M. and Verbeek, A. (2018). The tale of Cochran´s rule: my contingency table has so many expected values smaller than 5, what...
    • Kruskal, W. and Mosteller, F. (1979a). Representative sampling, I. International Statistical Review, 47, 13–24.
    • Kruskal, W. and Mosteller, F. (1979b). Representative sampling, II: scientific literature, excludind statistics. International Statistical...
    • Kruskal, W. and Mosteller, F. (1979c). Representative sampling, III: the current statistical literature. International Statistical Review,...
    • Kruskal, W. and Mosteller, F. (1980). Representative sampling, IV: The History of the Concept in Statistics, 1895-1939. International Statistical...
    • Larose, D.T. and Larose, C.D. (2014). Discovering Knowledge in Data: An Introduction to Data Mining. Wiley, New York.
    • Lazarsfeld, P.F. and Henry, N.W. (1968). Latent Structure Analysis. Houghton Mifflin, Boston.
    • Lewis, D. and Burke, C.J. (1949). The use and misuse of chi-square. Psychological Bulletin, 46, 433–489.
    • Lin, J.J., Chang, C.H. and Pal, N. (2015). A revisit to contingency table and tests of Independence: bootstrap is preferred to chi-square...
    • Lydersen, S., Fagerland, M.W. and Laake, P. (2009). Tutorial in biostatistics. Recommended tests for association in 2x2 tables. Statistics...
    • Marsaglia, G. (2003). Random number generators. Journal of Modern Applied Statistical Methods, 2, 2–13.
    • McCullough, B.D. (2000). The accuracy of Mathematica 4 as a statistical package. Computational Statistics, 15, 279–299.
    • McCullough, B.D. (2008). Special section on Microsoft Excel 2007. Computational Statistics and Data Analysis, 52, 4568–4569.
    • Mehta, C.R. and Patel, N.R. (1983). A network algorithm for performing Fisher’s exact test in r×c contingency tables. Journal of the American...
    • MESS (2017). La Muestra Continua de Vidas Laborales. Guı́a del contenido. Estadı́sticas, Presupuestos y Estudios. Estadı́sticas. Secretarı́a...
    • Moore, D.S. (1986). Tests of chi-squared type. In Goodness-of-fit Techniques (R. D’Agostino and M. Stephens, eds.). Marcel Dekker, New York,...
    • Okeniyi, J.O. and Okeniyi, E.T. (2012). Implementation of Kolmogorov Smirnov p-value computation in Visual Basic: implication for Microsoft...
    • Omair, A. (2014). Sample size estimation and sampling techniques for selecting a representative sample. Journal of Health Specialties, 2,...
    • Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is...
    • Pérez-Salamero González, J.M. (2015). La Muestra Continua de Vidas Laborales (MCVL) como fuente generadora de datos para el estudio del...
    • Pérez-Salamero González, J.M., Regúlez-Castillo, M. and Vidal-Meliá, C. (2016). Análisis de la representatividad de la MCVL: el...
    • Pérez-Salamero González, J.M., Regúlez-Castillo, M. and Vidal-Meliá, C. (2017). The continuous sample of working lives: improving...
    • Quintela-del-Rı́o, A. and Francisco-Fernández, M. (2017). Excel templates: a helpful tool for teaching statistics. The American Statistician,...
    • Ramsey, C.A. and Hewitt, A.D. (2005). A methodology for assessing sample representativeness. Environmental Forensics, 6, 71–75.
    • Ripley, B.D. (2002). Statistical methods need software: a view of statistical computing. Opening lecture Royal Statistical Society, Plymouth.
    • Ross, A. (2015). Probability or statistics-permorming a chi-square goodness-of-fit test. Mathematical Stack Exchange.
    • Tollenaar, N. and Mooijaart, A. (2003). Type I errors and power of the parametric bootstrap goodness-of-fit test: Full and limited information....
    • Tsang, W.W. and Cheng, K.H. (2006). The chi-square test when the expected frequencies are less than 5. In COMPSTAT 2006 Proceedings in Computational...
    • Wickens, T.D. (1989). Multiway Contingency Tables Analysis for the Social Sciences. Hillsdale, NJ: Erlbaum.
    • Wilkinson, L. (1994). Practical guidelines for testing statistical software. In Computational Statistics: Papers Collected on the Occasion...
    • Yates, F. (1934). Contingency tables involving small numbers and the χ2 test. Supplement to the Journal of the Royal Statistical Society,...

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno