Speaker
Description
Recent empirical and theoretical results suggest that deep networks possess an implicit low-rank bias: their weight matrices naturally evolve toward approximately low-rank structure, and a structured pruning of small singular values can often reduce model size with little or no loss in accuracy. While this phenomenon is already understood in simplified settings, a complete theory accounting for the effects of nonlinearities is still missing.
In this talk, we will present a framework that connects deep neural collapse to the emergence of low-rank structure in a broad class of nonlinear feedforward networks. For both nonlinear feedforward and residual architectures, we prove the global optimality of collapsed solutions and show that interpolating minima are effectively barrier-free paths to these global optima, offering a possible explanation for the ubiquity of collapse in practice.