The 2007 Edsger W. Dijkstra Prize in Distributed Computing goes to the paper

 

``Consensus in the presence of partial synchrony''

by Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer,

 

which appeared in the Journal of the ACM, Vol. 35, No. 2, April, 1988. pages 288--323.

A preliminary version appeared in PODC 1984.

 

This paper introduces a number of practically motivated partial synchrony models that lie between the completely synchronous and the completely asynchronous models, and in which consensus is solvable. It gave practitioners the right tool for building fault tolerant systems, and contributed to the understanding that safety can be maintained at all times, despite the impossibility of consensus and progress is facilitated during periods of stability. These are the pillars on which every fault tolerant system has been built for two decades. This includes academic projects such as Petal, Frangipani, and Boxwood, as well as real life data centers, such as the Google file system.

 

In distributed systems, balancing the pragmatics of building software that works against the need for rigor is particularly difficult because of impossibility results such as the FLP theorem. The publication by Dwork, Lynch, and Stockmeyer was in many respects the first to suggest a path through this thicket, and has been enormously influential. It presents consensus algorithms for a number of partial synchrony models with different timing requirements and failure assumptions: crash, authenticated Byzantine, and Byzantine failures. It also proves tight lower bounds on the resilience of such algorithms.

 

The eventual synchrony approach introduced in this paper is used to model algorithms that provide safety at all times, even in completely asynchronous runs, and guarantee liveness once the system stabilizes. This has since been established as the leading approach for circumventing the FLP impossibility result and solving asynchronous consensus, atomic broadcast, and state-machine replication.

 

In particular, the distributed systems engineering community has been increasingly drawn towards systems architectures that reflect the basic split between safety and liveness cited above.  Dwork, Lynch, and Stockmeyer thus planted the seed for a profound rethinking of the ways that we should build, and reason about, this class of systems. Following this direction are many foundational solutions. First, these include state machine replication methods such as Lamport's seminal Paxos algorithm and many group communication methods. Another important branch of research that directly follows this work is given by Chandra and Toueg's unreliable failure detector abstraction, which is realized in the eventual synchrony model of this paper. As Chandra and Toueg write:``we argue that partial synchrony assumptions can be encapsulated in the unreliability of failure detectors. For example, in the models of partial synchrony considered in Dwork et al. it is easy to implement a failure detector that satisfies the properties of <>W.'' Finally, the insight by Dwork, Lynch, and Stockmeyer also led to various timed-based models of partial synchrony, such as Cristian and Fetzer's Timed-Asynchronous model and others.

           

The award committee would like to acknowledge the sincere efforts by the nominators of this work, as well as all other (worthy!) nominations which came short of winning.

 

The committee wishes to pay a special tribute via this award to Larry Stockmeyer, who passed away on July 31, 2004. Larry's impact on the field through this paper and many others will always be remembered.

 

The Committee of the 2007 Edsger W. Dijkstra Prize in Distributed Computing

 

Hagit Attiya

Dahlia Malkhi

Keith Marzullo

Marios Mavronicolas

Andrzej Pelc

Roger Wattenhofer (Chair)