Jing Yang 4 лет назад
Родитель
Сommit
5d63da9319
1 измененных файлов с 70 добавлено и 26 удалено
  1. 70 26
      README.md

+ 70 - 26
README.md

@@ -11,7 +11,15 @@ committed to the log, it will never be changed or removed. Entries are appended
 guarantees that the log never diverges across replicas. The log as a whole, represents the internal state of the
 guarantees that the log never diverges across replicas. The log as a whole, represents the internal state of the
 application. Replicas of the same application agree on the log entries, and thus agree on the internal state.
 application. Replicas of the same application agree on the log entries, and thus agree on the internal state.
 
 
-## APIs
+## Proof of Concept
+
+[`durio`][durio] is a web-facing distributed key-value store. It is built
+on top of Ruaft as a showcase application. It provides a set of JSON APIs to read and write key-value pairs
+consistently across replicas.
+
+I have a 3-replica durio service running on Raspberry Pis. See the [README][durio] for more details.
+
+## Raft APIs
 
 
 Each application replica has a Raft instance that runs side by side with it. To add a log entry to the Raft log, the
 Each application replica has a Raft instance that runs side by side with it. To add a log entry to the Raft log, the
 application calls `start()` with the data it wants to store (commonly referred to as a "command"). The log entry is not
 application calls `start()` with the data it wants to store (commonly referred to as a "command"). The log entry is not
@@ -23,67 +31,103 @@ log (and other data) to a permanent storage via a `persister`. The log can grow
 Raft periodically asks the application to take a snapshot of its internal state. Log entries contained in the snapshot
 Raft periodically asks the application to take a snapshot of its internal state. Log entries contained in the snapshot
 will be discarded.
 will be discarded.
 
 
-An application creates a Raft instance with `new()`, with RPC interfaces for communication, a `persister`,
-a `apply_command`
-callback, and a `request_snapshot` callback. More details of those callbacks can be found in the comment of the modules.
+## Building on top of Ruaft
+
+To use Ruaft as a backend, an application would need to provide
+
+1. A set of RPC clients that allows one Raft instance to send RPCs to other Raft instances. RPC
+   clients should implement the [`RemoteRaft`][RemoteRaft] trait.
+2. A `persister` that writes binary blobs to permanent storage. It must implement the [`Persister`][Persister] trait.
+3. An `apply_command` function that accepts commands to the application's internal state. The commands must be applied
+   to the application's internal state in order and sequentially.
+4. Optionally, a `request_snapshot` function that takes snapshots on the application's internal state. Snapshots should
+   be delivered to [`Raft::save_snapshot()`][save_snapshot] API.
 
 
-### Raft RPCs
+All the above should be passed to the [Raft constructor][lib]. The constructor returns a Raft instance that is `Send`,
+`Sync` and implements `Clone`. The Raft instance can be used by the application, as well as the RPC server in the next
+item.
 
 
-Ruaft has three RPC handlers: `append_entries()`, `request_vote()` and `install_snapshot()`, serving requests from other
-Raft instances of the same application. These RPC handlers should be added to the RPC system that the application uses.
+5. An RPC server that accepts the RPCs defined in [`RemoteRaft`][RemoteRaft] (`append_entries()`, `request_vote()` and
+   `install_snapshot()`). The RPC requests should be proxied to the corresponding API of the Raft instance.
+
+In sub-crates [durio][durio] and the underlying [kvraft][kvraft], you can find examples of all the elements mentioned
+above. [`tarpc`][tarpc] is used to generate the RPC servers and clients. The `persister` does not do anything.
+`apply_command` and `request_snapshot` is provided by [kvraft][kvraft].
+
+## Ruaft Internals
 
 
 ### Graceful shutdown
 ### Graceful shutdown
 
 
-The `kill()` method provides a clean way to gracefully shutdown a Ruaft instance. It notifies all threads and wait for
-all tasks to complete. `kill()` then checks if there are any panics or assertion failures during the execution. It
-panics the main thread if there is any error. If there is no failure, `kill()` is guaranteed to return, assuming there
-is no thread starvation.
+The `kill()` method provides a clean way to gracefully shut down a Ruaft instance. It notifies all threads and wait for
+all tasks to exit. `kill()` then checks if there are any panics or assertion failures during the lifetime of the Raft
+instance. It panics the main thread if there is any error. If there is no failure, `kill()` is guaranteed to return,
+assuming there is no thread starvation.
 
 
-## Code Quality
+### Code Quality
 
 
 The implementation is thoroughly tested. I copied (and translated) the test sets from an obvious source. To avoid being
 The implementation is thoroughly tested. I copied (and translated) the test sets from an obvious source. To avoid being
 indexed by a search engine, I will not name the source. The testing framework from the same source is also translated
 indexed by a search engine, I will not name the source. The testing framework from the same source is also translated
 from the original Go version. The code can be found at the [`labrpc`](https://github.com/ditsing/labrpc) repo.
 from the original Go version. The code can be found at the [`labrpc`](https://github.com/ditsing/labrpc) repo.
 
 
-## KV Server
+### KV Server
 
 
 To test the snapshot functionality, I wrote a key-value store that supports `get()`, `put()` and `append()`. The
 To test the snapshot functionality, I wrote a key-value store that supports `get()`, `put()` and `append()`. The
 complexity of the key-value store is so high that it has its own set of tests. For Ruaft, integration tests in
 complexity of the key-value store is so high that it has its own set of tests. For Ruaft, integration tests in
-`tests/snapshot_tests.rs` are all based on the KV server. The KV server is inspired by the equivalent Go version.
+[`tests/snapshot_tests.rs`][snapshot_tests] are all based on the KV server. The KV server is inspired by the equivalent
+Go version.
 
 
-## Daemons
+## Threading Model
 
 
-Ruaft uses both threads and async thread pools. There are 4 'daemon threads':
+Ruaft uses both threads and async thread pools. There are four 'daemon threads' that runs latency-sensitive tasks:
 
 
 1. Election timer: watches the election timer, starts and cancels elections. Correctly implementing a versioned timer is
 1. Election timer: watches the election timer, starts and cancels elections. Correctly implementing a versioned timer is
    one of the most difficult tasks in Ruaft;
    one of the most difficult tasks in Ruaft;
 1. Sync log entries: waits for new logs, talks to followers and marks contracts as 'agreed on';
 1. Sync log entries: waits for new logs, talks to followers and marks contracts as 'agreed on';
-1. Apply command daemon: sends consensus to the application. Communicates with the rest of `Ruaft` via a `Condvar`;
+1. Apply command daemon: sends consensus to the application. Communicates with the rest of Ruaft via a `Condvar`;
 1. Snapshot daemon: requests and processes snapshots from the application. Communicates with the rest of `Ruaft` via
 1. Snapshot daemon: requests and processes snapshots from the application. Communicates with the rest of `Ruaft` via
-   A `Condvar` and a thread parker. Unlike other daemons, the snapshot daemon runs even if the current instance is a
-   follower.
+   A `Condvar` and a thread parker. The snapshot daemon runs even if the current instance is a follower.
 
 
-To avoid blocking, daemon threads never sends RPC directly. RPC-sending is offloaded to a dedicated thread pool that
+To avoid blocking, daemon threads never send RPCs directly. RPC-sending is offloaded to a dedicated thread pool that
 supports `async/.await`. There is a global RPC timeout, so RPCs never block forever. The thread pool also handles
 supports `async/.await`. There is a global RPC timeout, so RPCs never block forever. The thread pool also handles
 vote-counting in the elections, given the massive amount of waiting involved.
 vote-counting in the elections, given the massive amount of waiting involved.
 
 
 Last but not least, there is the heartbeat daemon. It is so simple that it is just a list of periodical tasks that run
 Last but not least, there is the heartbeat daemon. It is so simple that it is just a list of periodical tasks that run
 forever in the thread pool.
 forever in the thread pool.
 
 
-## Running
+### Why both?
 
 
-It is close to impossible to run Ruaft outside of the testing setup under `tests`. One would have to supply an RPC
-environment plus bridges, a `persister`, an `apply_command` callback and a `request_snapshot` callback.
+We use daemon threads to minimise latency. Take the election timer as an example. The election timer is supposed to wake
+up roughly every 150 milliseconds and check if a heartbeat has been received in that time. It only needs a tiny bit of
+CPU, but we could not afford to delay its execution. The longer it takes for the timer to respond to a lost heartbeat,
+the longer the pack runs without a leader. Having a standalone thread reserved for the election timer reduces the
+delay to react.
 
 
-Things would be better after I implement an RPC interface and improve the `persister` trait.
+Daemon threads are also used for isolation. Calling client callbacks (`apply_command` and `request_snapshot`) are
+dangerous because they can block at any time. For that reason we should not run them in the shared thread pool. Instead,
+one thread is created for each of the two callbacks. This way, we give more flexibility to the application
+implementation, allowing one or both callbacks to be blocking.
+
+On the other hand, thread pools are for throughput. Tasks that involve a lot of waiting, e.g. sending packets over the
+network, are run on thread pools. We could send as many packets out as possible, while simultaneously waiting for the
+responses to arrive. To get even more throughput, tasks sent to the same peer can be grouped together. Optimizations
+like that will be added if proven useful.
 
 
 ## Next steps
 ## Next steps
 
 
-- [x] Split into multiple files
+- [x] Split the code into multiple files
 - [x] Add public documentation
 - [x] Add public documentation
 - [x] Add a proper RPC interface to all public methods
 - [x] Add a proper RPC interface to all public methods
 - [x] Allow storing arbitrary information
 - [x] Allow storing arbitrary information
 - [x] Add more logging
 - [x] Add more logging
 - [ ] Benchmarks
 - [ ] Benchmarks
-- [ ] Support `Prevote`
+- [ ] Support the `Prevote` state
 - [x] Run Ruaft on a Raspberry Pi cluster
 - [x] Run Ruaft on a Raspberry Pi cluster
+
+[durio]: https://github.com/ditsing/ruaft/tree/master/durio
+[kvraft]: https://github.com/ditsing/ruaft/tree/master/kvraft
+[tarpc]: https://github.com/google/tarpc
+[lib]: https://github.com/ditsing/ruaft/blob/master/src/lib.rs
+[RemoteRaft]: https://github.com/ditsing/ruaft/blob/master/src/remote_raft.rs
+[Persister]: https://github.com/ditsing/ruaft/blob/master/src/persister.rs
+[save_snapshot]: https://github.com/ditsing/ruaft/blob/master/src/snapshot.rs
+[snapshot_tests]: https://github.com/ditsing/ruaft/blob/master/tests/snapshot_tests.rs