Failure detectors (2021.7.22) | Monologue by Dr. Geek

Monologue by Dr. Geek

Lonely Researcher, Prof. Dr. Geek
He has been challenging advanced technology research in academia, education for young people and technology funding by the unique IT technologies for bright future of mankind.

The design of fault-tolerant distributed systems. 

 

It is widely known that the design and verification of fault-tolerent distributed systems is a difficult problem. 

 

Consensus and atomic broadcast are two important paradigms in the design of fault-tolerent distributed systems and they find wide applications. 

 

Consensus allows a set of processes to reach a common decision or value that depends upon the initial values at the processes, regardless of failures. 

 

In atomic broadcast, processes reliably broadcast messages such that they agree on the set of messages delivered and the order of message deliveries. 

 

This chapter focuses on solutions to consensus and atomic broadcast problems in asynchronous distributed systems. 

 

In asynchronous distributed systems, there is no bound on the time it takes for a process to execute a computation step or for a message to go from its sender to its receiver. 

 

In an asynchronous distributed system, there is no upper bound on the relative processor speeds, execution times, clock drifts, and delay during the transmission of messages although they are finite. 

 

This is mainly casued by unpredictable loads on the system that causes asynchrony in the system and one cannot make any timing assumptions of any types. 

 

On the other hand, synchronous systems are characterized by strict bounds on the execution times and message transmission delays.