**Keywords:**game theory, continuous games, generative adversarial networks, theory, gradient descent-ascent, equilibrium, convergence**Abstract:**We study the role that a finite timescale separation parameter $\tau$ has on gradient descent-ascent in non-convex, non-concave zero-sum games where the learning rate of player 1 is denoted by $\gamma_1$ and the learning rate of player 2 is defined to be $\gamma_2=\tau\gamma_1$. We provide a non-asymptotic construction of the finite timescale separation parameter $\tau^{\ast}$ such that gradient descent-ascent locally converges to $x^{\ast}$ for all $\tau \in (\tau^{\ast}, \infty)$ if and only if it is a strict local minmax equilibrium. Moreover, we provide explicit local convergence rates given the finite timescale separation. The convergence results we present are complemented by a non-convergence result: given a critical point $x^{\ast}$ that is not a strict local minmax equilibrium, we present a non-asymptotic construction of a finite timescale separation $\tau_{0}$ such that gradient descent-ascent with timescale separation $\tau\in (\tau_0, \infty)$ does not converge to $x^{\ast}$. Finally, we extend the results to gradient penalty regularization methods for generative adversarial networks and empirically demonstrate on CIFAR-10 and CelebA the significant impact timescale separation has on training performance.**One-sentence Summary:**We show that there exists a range of finite learning ratios which we construct such that gradient descent-ascent converges to a critical point if and only if it is a strict local minmax equilibrium**Code Of Ethics:**I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics**Supplementary Material:**zip

57 Replies

Loading