Menu

Dezyne Model of the Raft Consensus Algorithm

This Dezyne modelling example is Eric Dortmans’s submission for the Dezyne Challenge 2018.

The application

Consensus is a fundamental problem in distributed computing and multi-agent systems. Objective of consensus algorithms is to achieve overall system reliability in the presence of a number of faulty processes. Consensus algorithms allow a collection of machines to work as a coherent group that can survive the failures of some of its members.

Raft is a proven and reasonably understandable consensus algorithm, used in modern cloud server clusters. It is an implementation of the state machine replication concept.

image

The Dezyne model in this repository is based on the paper In Search of an Understandable Consensus Algorithm (Extended Version) by Diego Ongaro and John Ousterhout.

The Raft algorithm decomposes consensus within a cluster of servers into three sub-problems:

  1. Leader Election: A leader is elected by majority vote among the servers in the cluster. In case the leader fails automatically a new leader is elected. The algorithm takes care that in each term (virtual time period) there can only be one leader. The other servers in the cluster follow the leader’s orders.

  2. Log replication: The leader receives commands from clients, logs them and replicates them to the followers. When a log entry is replicated to a majority of servers, it is supposed to be committed. The command will then be executed and its result is returned to the client.

  3. Guarding Consistency: The rules of the algorithm take care that commands are committed and executed by all servers in the same sequence they have been requested. If one of the servers has committed a log entry at a particular index, no other server can apply a different log entry for that index.

Each server can be in one of three states: leader, follower, or candidate. A leader leads the pack. A follower follows the leader. A candidate wants to become leader.

image

Time is divided in virtual time periods, called terms. In a term there can be at most one leader. Each server maintains a currentTerm variable representing its view of time. The currentTerm is increased monotonically by a candidate each time it (re)starts a new leader election.

Initially all servers start as followers. A follower that does not hear from a leader within a (randomly selected) election timeout period will become candidate. A candidate will become leader when it has collected a majority of votes from other servers in the cluster. A leader confirms its leadership using regular heartbeat messages and services the requests of clients. A candidate or leader will step down and become follower when they hear (via messages they receive) from a leader of a higher term then their own.

Raft basically needs just three kind of Remote Procedure Calls (RPC’s):

  • ClientRequest RPC: used by clients to request the execution of a command

  • RequestVote RPC: used candidates to collect votes to become leader

  • AppendEntries RPC: used by the leader to replicate log entries and as heartbeat to confirm its leadership

Architecture

The following SystemView (generated using the Dezyne IDE) gives an overview of our Dezyne model.

image

In what follows we will describe each component of the model.

Server

The Server component represents what we wanted to model: a Server that implements the Raft consensus protocol. The Server component is our system component. It contains all other components and their interconnections.

A Server Cluster can be build using multiple instances of this Server. The Raft protocol takes care of synchronizing all Servers the (elected) leader of the Cluster. The leader will take care of replicating the requested command to its followers.

Clients can request the whole Cluster to execute a command. They must send a request (ClientRequest) to the Server that is the (elected) leader of the Cluster. The leader will take care of replicating the requested command to its followers.

Raft is based on the concept of consensus, i.e. all decisions in the Cluster (in particular "leader election" and "data replication") are supposed to be committed when a majority of the Servers agree. Therefore a Cluster must contain an odd number of Servers.

ConsensusModule

The ConsensusModule is the central component of a Server. It manages the state of the Server. To this end it communicates to its peers using the Raft protocol.

The Server can be Offline or Online. When Online it can be in one of three Raft states:

  • Follower

  • Candidate

  • Leader

Initially all Servers in the Cluster start in the Follower state and each start an election timer (with a random timeout).

When the election timer times out without a message from the Leader, a Follower assumes there is no leader, switches to the Candidate state and starts an election procedure to get enough votes to become the new leader. It sends a message (RequestVote) to all its peers to ask for their votes. When a majority of votes is received the Candidate becomes the Leader.

A Leader is responsible for receiving, logging and replication Client commands to its Followers. It has a heartbeat timer set to a much smaller time interval than the election timer. Each heartbeat timeout it will send updates (AppendEntries) to its Followers to both confirm its leadership and to replicate client commands.

For all messages received from Clients or its peer Servers it will consult the Rules component.

Rules

The Raft protocol contains a lot of rules to guarantee that: 1. The Servers in the Cluster maintain a consistent state. 2. Commands from Clients are logged and executed by all Servers in the same sequences as requested.

The Rules component maintains a log of Client requested commands. The log and some important variables have to be regularly committed on permanent storage. They have to be reloaded when the Server boots.

The detailed Raft rules and the command log are not well suited for specification using Dezyne. They have to be implemented using handcrafted code.

Timer

The Timer component contains a straightforward timer. Timeouts are be specified in millionseconds.

As described above two timers are used in Raft:

  • election Timer

  • heartbeat Timer

RpcClient and RpcServer

Each Raft Server both requires and provides Remote Procedure Calls (RPCs). Therefore each server has a RpcClient component to issue RPC requests and a RpcServer component to service RPC requests.

The Raft protocol uses a very small set of Remote Procedure Calls:

  • ClientRequest, used by a Client to send requests to the Leader

  • AppendEntries, used by a Leader to replicate Client commands to its Followers and to confirms its leadership

  • RequestVote, used by a Candidate to request votes from other Servers in order to become the new Leader

Each of these RPCs has both a positive and a negative reply message.

Config

The Config component is a simple configuration database, used to specify various parameters.

Project Results

We have applied both Verification and Validation.

Verification

Verification of the model, in particular the ConsensusModule and its interfaces, was performed via the Dezyne Model Verification facility in the Dezyne IDE. This facility is based on the mCRL2 modelchecker.

The actual verification log of the ConsensusModule:

dzn$ dzn --session=49 -v verify --model=ConsensusModule -I raft-dezyne-model raft-dezyne-modelConsensusModule.dzn --version=2.7.0
working directory: D:OneDriveDocumentsGITGitHubdortmans
......................................................
verify: IServer: check: deadlock: ok
....
verify: IServer: check: livelock: ok
...
verify: ITimer: check: deadlock: ok
....
verify: ITimer: check: livelock: ok
......
verify: IRules: check: deadlock: ok
.....
verify: IRules: check: livelock: ok
...
verify: IConfig: check: deadlock: ok
....
verify: IConfig: check: livelock: ok
...
verify: IRpcClient: check: deadlock: ok
....
verify: IRpcClient: check: livelock: ok
...
verify: IRpcServer: check: deadlock: ok
....
verify: IRpcServer: check: livelock: ok
.....
verify: ConsensusModule: check: deterministic: ok
.......
verify: ConsensusModule: check: illegal: ok
........
verify: ConsensusModule: check: deadlock: ok
........
verify: ConsensusModule: check: livelock: ok
...........
verify: ConsensusModule: check: compliance: ok

The latest version of our model has no verification errors as can be seen in above verification result log.

Validation

We have also used the Dezyne Model Simulation service to validate the states and state transitions as described in the Raft protocol.

Typical scenarios can be executed in the Dezyne IDE by starting Dezyne Model Simulation on the ConsensusModule and pressing the events buttons that we have specified below.

Bring Server online:

ctrl.OnLine +
rules.return

Leader election after election timeout:

electionTimer.timeout +
rpcclient.RequestVoteGranted +
rpcclient.RequestVoteGranted +
rules.leadershipWon

Leader handles new request from Client, replicated it and replies the result

rpcserver.ClientRequest

Leader replicates its loffed client commands every heartbeat

heartbeatTimer.timeout +
heartbeatTimer.timeout +
...

Leader executes command after replication and replies that result to

client +
rules.ClientRequestSucess

Leader looses leadership and becomes Follower:

heartbeatTimer.timeout +
rpcclient.AppendEntriesRefused +
rules.leadershipLost

Documentation

You can download the example in a zip-file here.

Note that all of the diagrams included in this document have been produced in and exported from Dezyne. The System View and State Charts are part of the Dezyne editor. The Sequence Trace has been produced using the Dezyne simulation engine.

Enjoy this article? Don't forget to share.