报告时间:2025年5月16号 15:00-16:00
报告地点: 统计与数据科学学院106
报告题目: Best Subset Selection EM Algorithm: Statistical Analysis and Applications
报告摘要: The mixed linear regression (MLR) model is extensively used to model data from heterogeneous populations. When data has ultra-high dimensions, the heterogeneity and high dimensionality together pose great challenges for parameter estimation. While some works have been devoted to addressing these challenges, they have various limitations, such as the absence of statistical analysis for the algorithm iterate sequence, or a lack of theoretical guarantees for variable selection consistency. In this article, we develop an $L_{2,0}$-constrained expectation-maximization (EM) algorithm and propose an efficient algorithm for solving the $L_{2,0}$-penalized optimization leveraging a best subset selection approach. We also introduce an information criterion for selecting the sparsity level and establish its consistency. Theoretically, we establish a non-asymptotic error bound for the algorithm iterate sequence and prove that the proposed procedure accurately recovers important variables. Numerically, our theoretical findings are supported by extensive numerical studies on both synthetic data and real data from the cancer cell line encyclopedia (CCLE).
报告人简介:曾靖,中国科学技术大学管理学院特任副教授。2017年本科毕业于中国科学技术大学,2022年博士毕业于美国佛罗里达州立大学。目前主要研究方向为数据降维,高维数据分析,张量数据分析,稳健统计,混合模型,迁移学习。有多篇论文发表在Journal of the American Statistical Association, Statistica Sinica等期刊上。目前主持国家自然科学基金青年基金。