Stable Subsampling of Big Data under Covariate Shift

发布时间2025-07-01文章来源 上海科技大学作者责任编辑系统管理员

The presence of data shift between training and test datasets, coupled with model misspecification, can lead to instability in regression predictions across diverse datasets. In this talk, we present a novel subsampling algorithm for stable prediction, which employs uniform design and confounder balancing methods. Theoretic analyses show that the uniform measure minimizes the maximum integral mean square error (MIMSE) and the global stability loss assesses the independence among variables in each candidate MIMSE-optimal subsampled sets. Numerical experiments conducted on synthetic and real-world datasets demonstrate the superiority of our proposed method over baseline approaches under model misspecification and covariate shift.