In this paper, we give a simple and efficient implementation of reverse-mode automatic differentiation, which both extends easily to higher-order functions, and has run time and memory consumption linear in the run time of the original program. In addition to a formal description of the translation, we also describe an implementation of this algorithm, and prove its correctness by means of a logical relations argument.

This keynote by Simon Peyton Jones was recorded at Haskell eXchange 2021 on 16 November 2021: Automatic differentiation is all the rage these days, largely because it is a key enabler for machine learning. But reverse-mode AD (the important kind) is a bit mind bending, and becomes much more so if you want reverse-mode AD for higher order programs (i.e. the kind we love). In this talk Simon Peyton Jones explains what AD is, and how we can do it for higher order programs, using a series of simple steps that take us a simple-and-obviously-correct version to a weird-but-very-efficient one. At the end of the road we’ll find the Kmett/Pearlmuttter/Siskind ‘ad’ library in Hackage… but hopefully we’ll see it with new eyes.