4 tahun lalu · 5d63da9319
--- a/README.md
+++ b/README.md
@@ -11,7 +11,15 @@ committed to the log, it will never be changed or removed. Entries are appended
 
				 guarantees that the log never diverges across replicas. The log as a whole, represents the internal state of the
			
 
				 application. Replicas of the same application agree on the log entries, and thus agree on the internal state.
			
 
				 
			
 
				-## APIs
			
 
				+## Proof of Concept
			
 
				+
			
 
				+[`durio`][durio] is a web-facing distributed key-value store. It is built
			
 
				+on top of Ruaft as a showcase application. It provides a set of JSON APIs to read and write key-value pairs
			
 
				+consistently across replicas.
			
 
				+
			
 
				+I have a 3-replica durio service running on Raspberry Pis. See the [README][durio] for more details.
			
 
				+
			
 
				+## Raft APIs
			
 
				 
			
 
				 Each application replica has a Raft instance that runs side by side with it. To add a log entry to the Raft log, the
			
 
				 application calls `start()` with the data it wants to store (commonly referred to as a "command"). The log entry is not
			
@@ -23,67 +31,103 @@ log (and other data) to a permanent storage via a `persister`. The log can grow
 
				 Raft periodically asks the application to take a snapshot of its internal state. Log entries contained in the snapshot
			
 
				 will be discarded.
			
 
				 
			
 
				-An application creates a Raft instance with `new()`, with RPC interfaces for communication, a `persister`,
			
 
				-a `apply_command`
			
 
				-callback, and a `request_snapshot` callback. More details of those callbacks can be found in the comment of the modules.
			
 
				+## Building on top of Ruaft
			
 
				+
			
 
				+To use Ruaft as a backend, an application would need to provide
			
 
				+
			
 
				+1. A set of RPC clients that allows one Raft instance to send RPCs to other Raft instances. RPC
			
 
				+   clients should implement the [`RemoteRaft`][RemoteRaft] trait.
			
 
				+2. A `persister` that writes binary blobs to permanent storage. It must implement the [`Persister`][Persister] trait.
			
 
				+3. An `apply_command` function that accepts commands to the application's internal state. The commands must be applied
			
 
				+   to the application's internal state in order and sequentially.
			
 
				+4. Optionally, a `request_snapshot` function that takes snapshots on the application's internal state. Snapshots should
			
 
				+   be delivered to [`Raft::save_snapshot()`][save_snapshot] API.
			
 
				 
			
 
				-### Raft RPCs
			
 
				+All the above should be passed to the [Raft constructor][lib]. The constructor returns a Raft instance that is `Send`,
			
 
				+`Sync` and implements `Clone`. The Raft instance can be used by the application, as well as the RPC server in the next
			
 
				+item.
			
 
				 
			
 
				-Ruaft has three RPC handlers: `append_entries()`, `request_vote()` and `install_snapshot()`, serving requests from other
			
 
				-Raft instances of the same application. These RPC handlers should be added to the RPC system that the application uses.
			
 
				+5. An RPC server that accepts the RPCs defined in [`RemoteRaft`][RemoteRaft] (`append_entries()`, `request_vote()` and
			
 
				+   `install_snapshot()`). The RPC requests should be proxied to the corresponding API of the Raft instance.
			
 
				+
			
 
				+In sub-crates [durio][durio] and the underlying [kvraft][kvraft], you can find examples of all the elements mentioned
			
 
				+above. [`tarpc`][tarpc] is used to generate the RPC servers and clients. The `persister` does not do anything.
			
 
				+`apply_command` and `request_snapshot` is provided by [kvraft][kvraft].
			
 
				+
			
 
				+## Ruaft Internals
			
 
				 
			
 
				 ### Graceful shutdown
			
 
				 
			
 
				-The `kill()` method provides a clean way to gracefully shutdown a Ruaft instance. It notifies all threads and wait for
			
 
				-all tasks to complete. `kill()` then checks if there are any panics or assertion failures during the execution. It
			
 
				-panics the main thread if there is any error. If there is no failure, `kill()` is guaranteed to return, assuming there
			
 
				-is no thread starvation.
			
 
				+The `kill()` method provides a clean way to gracefully shut down a Ruaft instance. It notifies all threads and wait for
			
 
				+all tasks to exit. `kill()` then checks if there are any panics or assertion failures during the lifetime of the Raft
			
 
				+instance. It panics the main thread if there is any error. If there is no failure, `kill()` is guaranteed to return,
			
 
				+assuming there is no thread starvation.
			
 
				 
			
 
				-## Code Quality
			
 
				+### Code Quality
			
 
				 
			
 
				 The implementation is thoroughly tested. I copied (and translated) the test sets from an obvious source. To avoid being
			
 
				 indexed by a search engine, I will not name the source. The testing framework from the same source is also translated
			
 
				 from the original Go version. The code can be found at the [`labrpc`](https://github.com/ditsing/labrpc) repo.
			
 
				 
			
 
				-## KV Server
			
 
				+### KV Server
			
 
				 
			
 
				 To test the snapshot functionality, I wrote a key-value store that supports `get()`, `put()` and `append()`. The
			
 
				 complexity of the key-value store is so high that it has its own set of tests. For Ruaft, integration tests in
			
 
				-`tests/snapshot_tests.rs` are all based on the KV server. The KV server is inspired by the equivalent Go version.
			
 
				+[`tests/snapshot_tests.rs`][snapshot_tests] are all based on the KV server. The KV server is inspired by the equivalent
			
 
				+Go version.
			
 
				 
			
 
				-## Daemons
			
 
				+## Threading Model
			
 
				 
			
 
				-Ruaft uses both threads and async thread pools. There are 4 'daemon threads':
			
 
				+Ruaft uses both threads and async thread pools. There are four 'daemon threads' that runs latency-sensitive tasks:
			
 
				 
			
 
				 1. Election timer: watches the election timer, starts and cancels elections. Correctly implementing a versioned timer is
			
 
				    one of the most difficult tasks in Ruaft;
			
 
				 1. Sync log entries: waits for new logs, talks to followers and marks contracts as 'agreed on';
			
 
				-1. Apply command daemon: sends consensus to the application. Communicates with the rest of `Ruaft` via a `Condvar`;
			
 
				+1. Apply command daemon: sends consensus to the application. Communicates with the rest of Ruaft via a `Condvar`;
			
 
				 1. Snapshot daemon: requests and processes snapshots from the application. Communicates with the rest of `Ruaft` via
			
 
				-   A `Condvar` and a thread parker. Unlike other daemons, the snapshot daemon runs even if the current instance is a
			
 
				-   follower.
			
 
				+   A `Condvar` and a thread parker. The snapshot daemon runs even if the current instance is a follower.
			
 
				 
			
 
				-To avoid blocking, daemon threads never sends RPC directly. RPC-sending is offloaded to a dedicated thread pool that
			
 
				+To avoid blocking, daemon threads never send RPCs directly. RPC-sending is offloaded to a dedicated thread pool that
			
 
				 supports `async/.await`. There is a global RPC timeout, so RPCs never block forever. The thread pool also handles
			
 
				 vote-counting in the elections, given the massive amount of waiting involved.
			
 
				 
			
 
				 Last but not least, there is the heartbeat daemon. It is so simple that it is just a list of periodical tasks that run
			
 
				 forever in the thread pool.
			
 
				 
			
 
				-## Running
			
 
				+### Why both?
			
 
				 
			
 
				-It is close to impossible to run Ruaft outside of the testing setup under `tests`. One would have to supply an RPC
			
 
				-environment plus bridges, a `persister`, an `apply_command` callback and a `request_snapshot` callback.
			
 
				+We use daemon threads to minimise latency. Take the election timer as an example. The election timer is supposed to wake
			
 
				+up roughly every 150 milliseconds and check if a heartbeat has been received in that time. It only needs a tiny bit of
			
 
				+CPU, but we could not afford to delay its execution. The longer it takes for the timer to respond to a lost heartbeat,
			
 
				+the longer the pack runs without a leader. Having a standalone thread reserved for the election timer reduces the
			
 
				+delay to react.
			
 
				 
			
 
				-Things would be better after I implement an RPC interface and improve the `persister` trait.
			
 
				+Daemon threads are also used for isolation. Calling client callbacks (`apply_command` and `request_snapshot`) are
			
 
				+dangerous because they can block at any time. For that reason we should not run them in the shared thread pool. Instead,
			
 
				+one thread is created for each of the two callbacks. This way, we give more flexibility to the application
			
 
				+implementation, allowing one or both callbacks to be blocking.
			
 
				+
			
 
				+On the other hand, thread pools are for throughput. Tasks that involve a lot of waiting, e.g. sending packets over the
			
 
				+network, are run on thread pools. We could send as many packets out as possible, while simultaneously waiting for the
			
 
				+responses to arrive. To get even more throughput, tasks sent to the same peer can be grouped together. Optimizations
			
 
				+like that will be added if proven useful.
			
 
				 
			
 
				 ## Next steps
			
 
				 
			
 
				-- [x] Split into multiple files
			
 
				+- [x] Split the code into multiple files
			
 
				 - [x] Add public documentation
			
 
				 - [x] Add a proper RPC interface to all public methods
			
 
				 - [x] Allow storing arbitrary information
			
 
				 - [x] Add more logging
			
 
				 - [ ] Benchmarks
			
 
				-- [ ] Support `Prevote`
			
 
				+- [ ] Support the `Prevote` state
			
 
				 - [x] Run Ruaft on a Raspberry Pi cluster
			
 
				+
			
 
				+[durio]: https://github.com/ditsing/ruaft/tree/master/durio
			
 
				+[kvraft]: https://github.com/ditsing/ruaft/tree/master/kvraft
			
 
				+[tarpc]: https://github.com/google/tarpc
			
 
				+[lib]: https://github.com/ditsing/ruaft/blob/master/src/lib.rs
			
 
				+[RemoteRaft]: https://github.com/ditsing/ruaft/blob/master/src/remote_raft.rs
			
 
				+[Persister]: https://github.com/ditsing/ruaft/blob/master/src/persister.rs
			
 
				+[save_snapshot]: https://github.com/ditsing/ruaft/blob/master/src/snapshot.rs
			
 
				+[snapshot_tests]: https://github.com/ditsing/ruaft/blob/master/tests/snapshot_tests.rs