Abstract: Training many modern deep learning models, such as generative models, residual networks, and transformers, can be naturally formulated as optimal control problems, where the dynamics of learning and architecture design are governed by control objectives in high-dimensional spaces. In this setting, Hamilton–Jacobi (HJ) equations and Mean-Field Games (MFGs) provide a mathematical framework for analyzing and improving training dynamics and model architectures. We first show how fundamental classes of generative flows, including continuous-time normalizing flows and score-based diffusion models, emerge intrinsically from MFG formulations with varying particle dynamics, cost functionals, divergences, and probability metrics. The forward–backward PDE structure of MFGs yields analytical insight and guides the design of robust, data-efficient training algorithms. Moreover, proximal optimal transport divergences serve as natural regularizers within the MFG/optimal-control formulation, stabilizing the forward–backward dynamics and enabling faster, more robust learning. The regularity theory of HJ equations, combined with model-uncertainty quantification, provides provable performance and robustness guarantees for both generative models and complex neural architectures such as transformers. Our theoretical analysis is supported by extensive numerical experiments, with applications to likelihood-free inference, foundation models for PDEs, the solution of high-dimensional control problems, and comprehensive validations on widely used ML benchmarks.