Jelajahi Sumber

Revamp project readme.

Jing Yang 4 tahun lalu
induk
melakukan
5d63da9319
1 mengubah file dengan 70 tambahan dan 26 penghapusan
  1. 70 26
      README.md

+ 70 - 26
README.md

@@ -11,7 +11,15 @@ committed to the log, it will never be changed or removed. Entries are appended
 guarantees that the log never diverges across replicas. The log as a whole, represents the internal state of the
 application. Replicas of the same application agree on the log entries, and thus agree on the internal state.
 
-## APIs
+## Proof of Concept
+
+[`durio`][durio] is a web-facing distributed key-value store. It is built
+on top of Ruaft as a showcase application. It provides a set of JSON APIs to read and write key-value pairs
+consistently across replicas.
+
+I have a 3-replica durio service running on Raspberry Pis. See the [README][durio] for more details.
+
+## Raft APIs
 
 Each application replica has a Raft instance that runs side by side with it. To add a log entry to the Raft log, the
 application calls `start()` with the data it wants to store (commonly referred to as a "command"). The log entry is not
@@ -23,67 +31,103 @@ log (and other data) to a permanent storage via a `persister`. The log can grow
 Raft periodically asks the application to take a snapshot of its internal state. Log entries contained in the snapshot
 will be discarded.
 
-An application creates a Raft instance with `new()`, with RPC interfaces for communication, a `persister`,
-a `apply_command`
-callback, and a `request_snapshot` callback. More details of those callbacks can be found in the comment of the modules.
+## Building on top of Ruaft
+
+To use Ruaft as a backend, an application would need to provide
+
+1. A set of RPC clients that allows one Raft instance to send RPCs to other Raft instances. RPC
+   clients should implement the [`RemoteRaft`][RemoteRaft] trait.
+2. A `persister` that writes binary blobs to permanent storage. It must implement the [`Persister`][Persister] trait.
+3. An `apply_command` function that accepts commands to the application's internal state. The commands must be applied
+   to the application's internal state in order and sequentially.
+4. Optionally, a `request_snapshot` function that takes snapshots on the application's internal state. Snapshots should
+   be delivered to [`Raft::save_snapshot()`][save_snapshot] API.
 
-### Raft RPCs
+All the above should be passed to the [Raft constructor][lib]. The constructor returns a Raft instance that is `Send`,
+`Sync` and implements `Clone`. The Raft instance can be used by the application, as well as the RPC server in the next
+item.
 
-Ruaft has three RPC handlers: `append_entries()`, `request_vote()` and `install_snapshot()`, serving requests from other
-Raft instances of the same application. These RPC handlers should be added to the RPC system that the application uses.
+5. An RPC server that accepts the RPCs defined in [`RemoteRaft`][RemoteRaft] (`append_entries()`, `request_vote()` and
+   `install_snapshot()`). The RPC requests should be proxied to the corresponding API of the Raft instance.
+
+In sub-crates [durio][durio] and the underlying [kvraft][kvraft], you can find examples of all the elements mentioned
+above. [`tarpc`][tarpc] is used to generate the RPC servers and clients. The `persister` does not do anything.
+`apply_command` and `request_snapshot` is provided by [kvraft][kvraft].
+
+## Ruaft Internals
 
 ### Graceful shutdown
 
-The `kill()` method provides a clean way to gracefully shutdown a Ruaft instance. It notifies all threads and wait for
-all tasks to complete. `kill()` then checks if there are any panics or assertion failures during the execution. It
-panics the main thread if there is any error. If there is no failure, `kill()` is guaranteed to return, assuming there
-is no thread starvation.
+The `kill()` method provides a clean way to gracefully shut down a Ruaft instance. It notifies all threads and wait for
+all tasks to exit. `kill()` then checks if there are any panics or assertion failures during the lifetime of the Raft
+instance. It panics the main thread if there is any error. If there is no failure, `kill()` is guaranteed to return,
+assuming there is no thread starvation.
 
-## Code Quality
+### Code Quality
 
 The implementation is thoroughly tested. I copied (and translated) the test sets from an obvious source. To avoid being
 indexed by a search engine, I will not name the source. The testing framework from the same source is also translated
 from the original Go version. The code can be found at the [`labrpc`](https://github.com/ditsing/labrpc) repo.
 
-## KV Server
+### KV Server
 
 To test the snapshot functionality, I wrote a key-value store that supports `get()`, `put()` and `append()`. The
 complexity of the key-value store is so high that it has its own set of tests. For Ruaft, integration tests in
-`tests/snapshot_tests.rs` are all based on the KV server. The KV server is inspired by the equivalent Go version.
+[`tests/snapshot_tests.rs`][snapshot_tests] are all based on the KV server. The KV server is inspired by the equivalent
+Go version.
 
-## Daemons
+## Threading Model
 
-Ruaft uses both threads and async thread pools. There are 4 'daemon threads':
+Ruaft uses both threads and async thread pools. There are four 'daemon threads' that runs latency-sensitive tasks:
 
 1. Election timer: watches the election timer, starts and cancels elections. Correctly implementing a versioned timer is
    one of the most difficult tasks in Ruaft;
 1. Sync log entries: waits for new logs, talks to followers and marks contracts as 'agreed on';
-1. Apply command daemon: sends consensus to the application. Communicates with the rest of `Ruaft` via a `Condvar`;
+1. Apply command daemon: sends consensus to the application. Communicates with the rest of Ruaft via a `Condvar`;
 1. Snapshot daemon: requests and processes snapshots from the application. Communicates with the rest of `Ruaft` via
-   A `Condvar` and a thread parker. Unlike other daemons, the snapshot daemon runs even if the current instance is a
-   follower.
+   A `Condvar` and a thread parker. The snapshot daemon runs even if the current instance is a follower.
 
-To avoid blocking, daemon threads never sends RPC directly. RPC-sending is offloaded to a dedicated thread pool that
+To avoid blocking, daemon threads never send RPCs directly. RPC-sending is offloaded to a dedicated thread pool that
 supports `async/.await`. There is a global RPC timeout, so RPCs never block forever. The thread pool also handles
 vote-counting in the elections, given the massive amount of waiting involved.
 
 Last but not least, there is the heartbeat daemon. It is so simple that it is just a list of periodical tasks that run
 forever in the thread pool.
 
-## Running
+### Why both?
 
-It is close to impossible to run Ruaft outside of the testing setup under `tests`. One would have to supply an RPC
-environment plus bridges, a `persister`, an `apply_command` callback and a `request_snapshot` callback.
+We use daemon threads to minimise latency. Take the election timer as an example. The election timer is supposed to wake
+up roughly every 150 milliseconds and check if a heartbeat has been received in that time. It only needs a tiny bit of
+CPU, but we could not afford to delay its execution. The longer it takes for the timer to respond to a lost heartbeat,
+the longer the pack runs without a leader. Having a standalone thread reserved for the election timer reduces the
+delay to react.
 
-Things would be better after I implement an RPC interface and improve the `persister` trait.
+Daemon threads are also used for isolation. Calling client callbacks (`apply_command` and `request_snapshot`) are
+dangerous because they can block at any time. For that reason we should not run them in the shared thread pool. Instead,
+one thread is created for each of the two callbacks. This way, we give more flexibility to the application
+implementation, allowing one or both callbacks to be blocking.
+
+On the other hand, thread pools are for throughput. Tasks that involve a lot of waiting, e.g. sending packets over the
+network, are run on thread pools. We could send as many packets out as possible, while simultaneously waiting for the
+responses to arrive. To get even more throughput, tasks sent to the same peer can be grouped together. Optimizations
+like that will be added if proven useful.
 
 ## Next steps
 
-- [x] Split into multiple files
+- [x] Split the code into multiple files
 - [x] Add public documentation
 - [x] Add a proper RPC interface to all public methods
 - [x] Allow storing arbitrary information
 - [x] Add more logging
 - [ ] Benchmarks
-- [ ] Support `Prevote`
+- [ ] Support the `Prevote` state
 - [x] Run Ruaft on a Raspberry Pi cluster
+
+[durio]: https://github.com/ditsing/ruaft/tree/master/durio
+[kvraft]: https://github.com/ditsing/ruaft/tree/master/kvraft
+[tarpc]: https://github.com/google/tarpc
+[lib]: https://github.com/ditsing/ruaft/blob/master/src/lib.rs
+[RemoteRaft]: https://github.com/ditsing/ruaft/blob/master/src/remote_raft.rs
+[Persister]: https://github.com/ditsing/ruaft/blob/master/src/persister.rs
+[save_snapshot]: https://github.com/ditsing/ruaft/blob/master/src/snapshot.rs
+[snapshot_tests]: https://github.com/ditsing/ruaft/blob/master/tests/snapshot_tests.rs