Multivariate functional data modeling with time-varying clustering

Philip A. White; Alan Gelfand

Ayuda

Multivariate functional data modeling with time-varying clustering

Philip A. White ^[1] ; Alan E. Gelfand ^[2]
1. [1] Brigham Young University
  
  Brigham Young University
  
  Estados Unidos
2. [2] Duke University
  
  Duke University
  
  Township of Durham, Estados Unidos
Localización: Test: An Official Journal of the Spanish Society of Statistics and Operations Research, ISSN-e 1863-8260, ISSN 1133-0686, Vol. 30, Nº. 3, 2021, págs. 586-602
Idioma: inglés
DOI: 10.1007/s11749-020-00733-z
Texto completo no disponible (Saber más ...)
Resumen
- We consider the setting of multivariate functional data collected over time at each of a set of sites. Our objective is to implement model-based clustering of the functions across the sites where we allow such clustering to vary over time. Anticipating dependence between the functions within a site as well as across sites, we model the collection of functions using a multivariate Gaussian process. With many sites and several functions at each site, we use dimension reduction to provide a computationally manageable stochastic process specification. To jointly cluster the functions, we use the Dirichlet process which enables shared labeling of the functions across the sites. Specifically, we cluster functions based on their response to exogenous variables. Though the functions arise over continuous time, clustering in continuous time is extremely computationally demanding and not of practical interest. Therefore, we employ partitioning of the timescale to capture time-varying clustering. Our illustrative setting is bivariate, monitoring ozone and PM10 levels over time for one year at a set of monitoring sites. The data we work with is from 24 monitoring sites in Mexico City for 2017 which record hourly ozone and PM10 levels. Hence, we have 48 functions to work with across 8760 hours. We provide a Gaussian process model for each function using continuous-time meteorological variables as regressors along with adjustment for daily periodicity. We interpret the similarity of functions in terms of their shape, captured through site-specific coefficients, and use these coefficients to develop the clustering.