DFG project “Multilevel Architectures and Algorithms in Deep Learning’
start of the project: 2021, end of the project: 2024
contract number: SCHI 1379/8-1
funding institution: DFG (Research Grants)
within the DFG Priority Research Programme 2298 "Theoretical Foundations of Deep Learning"
Prof. Dr. Anton Schiela
M.Sc. Frederik Köhne
External project members
Prof. Dr. Roland Herzog (Ruprecht-Karls-Universität Heidelberg, Interdisciplinary Center for Scientific Computing)
AIMS OF THE PROJECT
The design of deep neural networks (DNNs) and their training is a central issue in machine learning. Progress in these areas is one of the driving forces for the success of these technologies. Nevertheless, tedious experimentation and human interaction is often still needed during the learning process to find an appropriate network structure and corresponding hyperparameters to obtain the desired behavior of a DNN. The strategic goal of the proposed project is to provide algorithmic means to improve this situation. Our methodical approach relies on well established mathematical techniques: identify fundamental algorithmic quantities and construct a-posteriori estimates for them, identify and consistently exploit an appropriate topological framework for the given problem class, establish a multilevel structure for DNNs to account for the fact that DNNs only realize a discrete approximation of a continuous nonlinear mapping relating input to output data. Combining this idea with novel algorithmic control strategies and preconditioning, we will establish the new class of adaptive multilevel algorithms for deep learning, which not only optimize a fixed DNN, but also adaptively refine and extend the DNN architecture during the optimization loop. This concept is not restricted to a particular network architecture, and we will study feedforward neural networks, ResNets, and PINNs as relevant examples. Our integrated approach will thus be able to replace many of the current manual tuning techniques by algorithmic strategies, based on a-posteriori estimates. Moreover, our algorithm will reduce the computational effort for training and also the size of the resulting DNN, compared to a manually designed counterpart, making the use of deep learning more efficient in many aspects. Finally, in the long run our algorithmic approach has the potential to enhance the reliability and interpretability of the resulting trained DNN.
See also the GEPRIS information on the project, the website of the DFG priority program 2298 "Theoretical Foundations of Deep Learning", and the GEPRIS information on SPP 2298.