|
|
@@ -1,59 +1,87 @@
|
|
|
# Ruaft: Raft implemented in Rust
|
|
|
|
|
|
-Ruaft is a Rust version of [the Raft consensus protocol](https://raft.github.io/). The name is clearly made up.
|
|
|
+Ruaft is a Rust version of [the Raft consensus protocol](https://raft.github.io/).
|
|
|
|
|
|
-Ruaft exposes 4 APIs, to create a Raft instance (`new()`), start a new contract (`start()`), get the current status
|
|
|
-(`get_state()`) and shut it down (`kill()`). When a consensus is reached, it notifies the clients via an
|
|
|
-`apply_command` callback. It saves the internal state to local disks via a `persister`, which is supplied at creation.
|
|
|
-Periodically, it asks the application to take a snapshot of the current state, and archives contracts included in the
|
|
|
-snapshot to save memory and disk space.
|
|
|
+## Raft
|
|
|
|
|
|
-There are also three internal RPC handlers: `process_append_entries()`, `process_request_vote()` and
|
|
|
-`process_install_snapshot`, serving requests from other Raft instances of the same setup.
|
|
|
+Raft is the algorithm behind the famous key-value store [etcd](https://github.com/etcd-io/etcd). It helps replicas of a
|
|
|
+distributed application to reach consensus on internal state. Raft maintains a log that consists of entries. Each entry
|
|
|
+carries one piece of data (commonly referred to as a "command") for the application. Once an entry is committed to the
|
|
|
+log, it will never be changed or removed across all replicas. Entries are appended to the end of the log linearly, which
|
|
|
+guarantees that the log never diverges.
|
|
|
+
|
|
|
+## APIs
|
|
|
+
|
|
|
+Each application replica has a Raft instance that runs side by side with it. To add a log entry to the Raft log, the
|
|
|
+application calls `start()` with the data it wants to store. The log entry is not committed immediately. Instead, when
|
|
|
+the Raft instance is sure that the log entry is agreed upon, it calls the application back via an `apply_command`
|
|
|
+callback, which is supplied by the application.
|
|
|
+
|
|
|
+Internally, Raft talks to other replicas of the same application via an RPC interface. From time to time, Raft saves its
|
|
|
+log (and other data) to a permanent storage via a `persister`. The log can grow without bound. To save storage space,
|
|
|
+Raft periodically asks the application to take a snapshot of its state. Log entries contained in the snapshot will be
|
|
|
+discarded.
|
|
|
+
|
|
|
+An application creates a Raft instance with `new()`, with RPC interfaces for communication, a `persister`,
|
|
|
+a `apply_command`
|
|
|
+callback, and a `request_snapshot` callback. More details of those callbacks can be found in the comment of the modules.
|
|
|
+
|
|
|
+### Raft RPCs
|
|
|
+
|
|
|
+Ruaft has three RPC handlers: `process_append_entries()`, `process_request_vote()` and
|
|
|
+`process_install_snapshot()`, serving requests from other Raft instances of the same application. These RPC handlers
|
|
|
+should be added to the RPC system that the application uses.
|
|
|
+
|
|
|
+### Graceful shutdown
|
|
|
+
|
|
|
+The `kill()` method provides a clean way to gracefully shutdown a Ruaft instance. It notifies all threads and wait for
|
|
|
+all tasks to complete. `kill()` then checks if there are any panics or assertion failures during the execution. It
|
|
|
+panics the main thread if there is any error. Otherwise `kill()` is guaranteed to return, assuming there is no thread
|
|
|
+starvation.
|
|
|
+
|
|
|
+## Code Quality
|
|
|
|
|
|
-## Testing
|
|
|
The implementation is thoroughly tested. I copied (and translated) the tests from an obvious source. To avoid being
|
|
|
indexed by a search engine, I will not name the source. The testing framework from the same source is also translated
|
|
|
from the original Go version. The code can be found at the [`labrpc`](https://github.com/ditsing/labrpc) repo.
|
|
|
|
|
|
## KV Server
|
|
|
-To test the snapshot functionality, I wrote a KV server that supports `get()`, `put()` and `append()`. The complexity
|
|
|
-of the KV server is so high that it has its own set of tests. Integration tests in `tests/snapshot_tests.rs` are all
|
|
|
-based on the KV server. The KV server is inspired by the equivalent Go version.
|
|
|
+
|
|
|
+To test the snapshot functionality, I wrote a key-value store that supports `get()`, `put()` and `append()`. The
|
|
|
+complexity is so high that it has its own set of tests. Integration tests in `tests/snapshot_tests.rs` are all based on
|
|
|
+the KV server. The KV server is inspired by the equivalent Go version.
|
|
|
|
|
|
## Daemons
|
|
|
+
|
|
|
Ruaft uses both threads and async thread pools. There are 4 'daemon threads':
|
|
|
|
|
|
-1. Election timer: watches the election timer, starts and cancels elections. Correctly implementing a versioned timer
|
|
|
- is one of the most difficult tasks in Ruaft;
|
|
|
+1. Election timer: watches the election timer, starts and cancels elections. Correctly implementing a versioned timer is
|
|
|
+ one of the most difficult tasks in Ruaft;
|
|
|
1. Sync log entries: waits for new logs, talks to followers and marks contracts as 'agreed on';
|
|
|
1. Apply command daemon: sends consensus to the application. Communicates with the rest of `Ruaft` via a `Condvar`;
|
|
|
1. Snapshot daemon: requests and processes snapshots from the application. Communicates with the rest of `Ruaft` via
|
|
|
-A `Condvar` and a thread parker. Unlike other daemons, the snapshot daemon runs even if the current instance is a
|
|
|
+ A `Condvar` and a thread parker. Unlike other daemons, the snapshot daemon runs even if the current instance is a
|
|
|
follower.
|
|
|
|
|
|
-To avoid blocking, daemon threads never sends RPC directly. RPC-handling is offloaded to a dedicated thread pool that
|
|
|
+To avoid blocking, daemon threads never sends RPC directly. RPC-sending is offloaded to a dedicated thread pool that
|
|
|
supports `async/.await`. There is a global RPC timeout, so RPCs never block forever. The thread pool also handles
|
|
|
vote-counting in the elections, given the massive amount of waiting involved.
|
|
|
|
|
|
-Last but not least, the heartbeat daemon is so simple that it is just a list of periodical tasks that live in the
|
|
|
-thread pool. It does not need its own thread.
|
|
|
-
|
|
|
-## Shutdown
|
|
|
-The `kill()` method provides a clean way to shutdown a Ruaft instance. It notifies all threads and wait for all tasks to
|
|
|
-complete. `kill()` then checks if there are any panics or assertion failures during the execution. It panics the main
|
|
|
-thread if there is any error. Otherwise `kill()` is guaranteed to return, assuming there is no thread starvation.
|
|
|
+Last but not least, there is the heartbeat daemon. It is so simple that it is just a list of periodical tasks that run
|
|
|
+forever in the thread pool.
|
|
|
|
|
|
## Running
|
|
|
+
|
|
|
It is close to impossible to run Ruaft outside of the testing setup under `tests`. One would have to supply an RPC
|
|
|
environment plus bridges, a `persister`, an `apply_command` callback and a `request_snapshot` callback.
|
|
|
|
|
|
Things would be better after I implement an RPC interface and improve the `persister` trait.
|
|
|
|
|
|
## Next steps
|
|
|
+
|
|
|
- [x] Split into multiple files
|
|
|
-- [ ] Add public documentation
|
|
|
+- [x] Add public documentation
|
|
|
- [ ] Add a proper RPC interface to all public methods
|
|
|
- [ ] Benchmarks
|
|
|
-- [x] Allow storing of arbitrary information, instead of a `i32`
|
|
|
+- [x] Allow storing arbitrary information
|
|
|
- [ ] Add more logging.
|