【SIST】Random Matrix Theory for Machine Learning and Signal Processing: From Neural Networks to Gaussian Universality

发布时间2025-12-14文章来源上海科技大学作者责任编辑系统管理员

Deep neural networks have become the cornerstone of modern machine learning, yet their multi-layer structure, nonlinearities, and intricate optimization processes pose considerable theoretical challenges. In the first part of the talk, I will review recent advances in random matrix analysis that shed new light on these complex ML models. Starting with the foundational case of linear regression, I will demonstrate how the proposed analysis extends naturally to shallow nonlinear and ultimately deep nonlinear network models. I will also discuss practical implications (e.g., compressing and/or designing "equivalent" neural network models) that arise from these theoretical insights. This part is based on a recent review paper https://arxiv.org/abs/2506.13139 joint with Michael W. Mahoney (University of California, Berkeley).

Gaussian universality is a pervasive concept in statistics, information/data science, and machine learning (ML). It has been both empirically observed and mathematically proven that in the high-dimensional settings, many ML methods are only able to exploit the first and second-order moments of data distribution, behaving as if the data were Gaussian or Gaussian mixtures. In the second part of the talk, we will discuss examples and counterexamples of Gaussian universality in the classification of high-dimensional Gaussian mixture and linear factor mixture models, the latter potentially including non-Gaussian components. With a flexible "leave-one-out" analysis approach, we derive precise expressions for the generalization performance of ERM classifiers on data drawn from these two models. We also specify the conditions under which Gaussian universality is upheld or fails, as a function of the model's nonlinearity. This part is based on joint work with Xiaoyi Mai (IMT, France).