Ir al contenido

Documat


Learning to build statistical indicators from open data sources

  • Pilar Rey-del-Castillo [1]
    1. [1] Institute of Fiscal Studies
  • Localización: BEIO, Boletín de Estadística e Investigación Operativa, ISSN 1889-3805, Vol. 39, Nº. 3, 2023
  • Idioma: inglés
  • Enlaces
  • Resumen
    • One of the biggest challenges facing official statistics today is the use of the massive amount of data generated on the web or by sensors and other electronic devices for the production of statistical figures. This paper presents the building of several statistical indicators from different Open Data sources. All the indicators have been built using a common methodological approach to estimate changes across time. The purpose of the paper is to show the different problems that must be addressed when using these data sources and to learn about the different ways to cope with them. The first Open Data source is traffic sensors data, where the data about the geographical location of the sensors permits to compute traffic intensity indicators at detailed geographical level. Apart from being proxies or lead indicators for economic activity, the figures can be used to measure the impact of different traffic arrangements in specific areas. Before constructing the indicators for the following source, call records from a multichannel citizen attention service, the data have been analyzed using Natural Language Processing tools to identify several categories of topics for the requests received. Other Open Data sources, Twitter messages and scraped data from a digital newspapers’ library website, are studied using similar tools in both situations. A rough idea about the evolution for the general sentiments in Spain is obtained from Twitter messages. From scraped data, the evolution of the average opinions and sentiments in the country’s newspapers is similarly computed. Usually, it is accepted that the ideas expressed in the newspapers are relevant to conform public opinion. On the other hand, an interesting result obtained in our research is that individuals react stronger and more quickly than newspapers to some social, political or economic events.

  • Referencias bibliográficas
    • Box, G. E. P., and G. Jenkins. 1970. Time Series Analysis: Forecasting and Control. San Francisco, CA: Holden-Day.
    • Brakel, J. van den. 2022. “New Data Sources and Inference Methods for Official Statistics.” In Statistics in the Public Interest. Springer...
    • Jurafsky, D., and J. H. Martin. 2008. Speech and Language Processing (2nd Ed.). N. J.: Pearson Prentice Hall.
    • Python Core Team. 2015. Python: A Dynamic, Open Source Programming Language. Python Software Foundation. https://www.python.org/.
    • Rey-del-Castillo, P. 2019a. “A Preliminary Assessment of the Traffic Measures in Madrid City.” In CEUR Workshop Proceedings, Second International...
    • Rey-del-Castillo, P. 2019b. “A Sentiment Index Based on Spanish Tweets.” Boletín de Estadística e Investigación Operativa BEIO 35 (2): 130–47.
    • Stadthagen-Gonzalez, H., C. Imbault, M. A. Pérez, and M. Brysbaert. 2016. “Norms of Valence and Arousal for 14,031 Spanish Words.” Behavior...
    • Stone, R., and S. J. Prais. 1952. “Systems of Aggregative Index Numbers and Their Compatibility.” The Economic Journal 72 (247): 565–83.
    • Tiao, G. C. 1985. “Autoregressive Moving Average Models, Intervention Problems and Outlier Detection in Time Series.” In Handbook of Statistics,...
    • Twitter. 2020. “Twitter API Access Levels and Versions.” Twitter Developer Platform. 2020. https://developer.twitter.com/en/docs/twitter-api/getting-started/about-twitter-api#v2-access-level.
    • Wagner, J., P. Arora, S. Cortes, U. Barman, D. Bogdanova, J. Foster, and L. Tounsi. 2014. “Aspect-Based Polarity Classification.” In Proceedings...
    • Zaharia, M., R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, et al. 2016. “Apache Spark: A Unified Engine for Big Data Processing.”...

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno