DFG project “Curse-of-dimensionality-free nonlinear optimal feedback control with deep neural networks. Spatially decreasing sensitivity and nonsmooth problems”
start of the project: 2024, end of the project: 2027
contract number: GR 1569/23-2
funding institution: DFG (Research Grants)
within the DFG Priority Research Programme 2298 “Theoretical Foundations of Deep Learning”
PROJECT MEMBERS
Principal investigator
Project member
AIMS OF THE PROJECT
Optimal feedback control is one of the areas where deep learning has an enormous impact. Deep Reinforcement Learning, one of the methods for obtaining optimal feedback laws and arguably one of the most successful algorithms in artificial intelligence, stands behind the spectacular performance of artificial intelligence in games such as Chess or Go, but has also manifold applications in science, technology, and economy.
This project explores the mathematical foundation of this success. We focus on the identification of conditions under which the high-dimensional functions that need to be computed in optimal control can be efficiently (i.e., avoiding the curse of dimensionality) approximated by deep neural networks (DNNs). Particularly, on the one hand we look at optimal value functions, which are represented as unique viscosity solutions of Hamilton-Jacobi-Bellman PDEs. On the other hand, we consider control Lyapunov functions (clfs), which replace the optimal value functions when the state of a system shall be asymptotically stabilized at a desired set or set point, not necessarily in an optimal way. These functions can be characterized as viscosity supersolutions of Hamilton-Jacobi-Bellman PDEs and provide a somewhat simplified setting in which we can develop our methods. From both types of functions the optimal or asymptotically stabilizing control can be computed in feedback form, which is the ultimate goal when solving control problems on long or even infinite time horizons.
In the first funding period we identified various conditions on the problem data, i.e., on the dynamics and the cost function, under which the resulting functions can be approximated by compositional or separable functions, which are efficiently representable via DNNs in high dimensions. Probably the most important finding of the first funding period is that a spatially decaying sensitivity property is the key for constructing an overlapping separable approximation to an optimal value function. This property has recently been studied by a number of authors (in a temporal form also by the proposer of this project) and the understanding of its impact on DNN approximations will be one of the focal points of the second funding period.
A limitation of the results from the first funding period is that they currently apply only to smooth optimal value functions or clfs and to DNNs with smooth activation functions. The latter excludes the popular and computationally efficient ReLu DNNs, while the former excludes all control problems in which no smooth solution exists, such as asymptotic stabilization problems with obstacles. For this kind of problems it is known that only non-smooth approximants allow for the computation of meaningful feedback laws. The second focal point of the second funding period will therefore be the development of approximating ReLu DNNs for problems with non-smooth solutions, building upon the results for smooth problems from the first funding period.
See also the GEPRIS information on the project, the website of the DFG priority program 2298 “Theoretical Foundations of Deep Learning”, and the GEPRIS information on SPP 2298.