Statistical learning has been a foundational framework for understanding machine learning and deep learning models, offering key insights into generalization and optimization. However, the pretraining–alignment paradigm of Large Language Models (LLMs) introduces new challenges. Specifically, (a) their error rates do not fit conventional parametric or nonparametric regimes and exhibit dataset-size dependence, and (b) the training and testing tasks can differ significantly, complicating generalization. In this talk, we propose new learning frameworks to address these challenges. Our analysis highlights three key insights: the necessity of data-dependent generalization analysis, the role of sparse sequential dependence in language learning, and the importance of autoregressive compositionality in enabling LLMs to generalize to unseen tasks.