-
Notifications
You must be signed in to change notification settings - Fork 161
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Feature: permit follower log to revert to earlier state with `--featu…
…res loosen-follower-log-revert` Add a new feature flag `loosen-follower-log-revert`, to permit the follower's log to roll back to an earlier state without causing the leader to panic. Although log state reversion is typically seen as a bug, enabling it can be useful for testing or in some special scenarios. For instance, in an even number nodes cluster, erasing a node's data and then rebooting it(log reverts to empty) will not result in data loss. - Related issue: #898
- Loading branch information
1 parent
9815283
commit 6d42c6e
Showing
9 changed files
with
163 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,52 @@ | ||
# FAQ | ||
|
||
- Q: 🤔 Why is log id `(term, node_id, log_index)`, while standard Raft uses just | ||
`(term, log_index)`? | ||
- **🤔 Why is log id a tuple of `(term, node_id, log_index)`, while standard Raft uses just | ||
`(term, log_index)`**? | ||
|
||
A: The log id `(term, node_id, log_index)` is used to minimize the chance of election conflicts. | ||
💡 The log id `(term, node_id, log_index)` is used to minimize the chance of election conflicts. | ||
This way in every term there could be more than one leaders elected, and the last one is valid. | ||
See: [`leader-id`](`crate::docs::data::leader_id`) for details. | ||
<br/><br/> | ||
|
||
|
||
- **🤔 How to remove node-2 safely from a cluster `{1, 2, 3}`**? | ||
|
||
💡 Call `Raft::change_membership(btreeset!{1, 3})` to exclude node-2 from | ||
the cluster. Then wipe out node-2 data. | ||
**NEVER** modify/erase the data of any node that is still in a raft cluster, unless you know what you are doing. | ||
<br/><br/> | ||
|
||
|
||
- **🤔 Can I wipe out the data of **one** node and wait for the leader to replicate all data to it again**? | ||
|
||
💡 Avoid doing this. Doing so will panic the leader. But it is permitted | ||
if [`loosen-follower-log-revert`] feature flag is enabled. | ||
|
||
In a raft cluster, although logs are replicated to multiple nodes, | ||
wiping out a node and restarting it is still possible to cause data loss. | ||
Assumes the leader is `N1`, followers are `N2, N3, N4, N5`: | ||
- A log(`a`) that is replicated by `N1` to `N2, N3` is considered committed. | ||
- At this point, if `N3` is replaced with an empty node, and at once the leader `N1` is crashed. Then `N5` may elected as a new leader with granted vote by `N3, N4`; | ||
- Then the new leader `N5` will not have log `a`. | ||
|
||
```text | ||
Ni: Node i | ||
Lj: Leader at term j | ||
Fj: Follower at term j | ||
N1 | L1 a crashed | ||
N2 | F1 a | ||
N3 | F1 a erased F2 | ||
N4 | F2 | ||
N5 | elect L2 | ||
----------------------------+---------------> time | ||
Data loss: N5 does not have log `a` | ||
``` | ||
But for even number nodes cluster, Erasing **exactly one** node won't cause data loss. | ||
Thus, in a special scenario like this, or for testing purpose, you can use | ||
`--feature loosen-follower-log-revert` to permit erasing a node. | ||
<br/><br/> | ||
[`loosen-follower-log-revert`]: `crate::docs::feature_flags#loosen_follower_log_revert` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
63 changes: 63 additions & 0 deletions
63
tests/tests/replication/t60_feature_loosen_follower_log_revert.rs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
use std::sync::Arc; | ||
use std::time::Duration; | ||
|
||
use anyhow::Result; | ||
use maplit::btreeset; | ||
use openraft::Config; | ||
use openraft_memstore::MemStore; | ||
|
||
use crate::fixtures::init_default_ut_tracing; | ||
use crate::fixtures::RaftRouter; | ||
|
||
/// With "--features loosen-follower-log-revert", the leader allows follower to revert its log to an | ||
/// earlier state. | ||
#[async_entry::test(worker_threads = 4, init = "init_default_ut_tracing()", tracing_span = "debug")] | ||
async fn feature_loosen_follower_log_revert() -> Result<()> { | ||
let config = Arc::new( | ||
Config { | ||
enable_tick: false, | ||
enable_heartbeat: false, | ||
..Default::default() | ||
} | ||
.validate()?, | ||
); | ||
|
||
let mut router = RaftRouter::new(config.clone()); | ||
|
||
tracing::info!("--- initializing cluster"); | ||
let mut log_index = router.new_cluster(btreeset! {0,1,2}, btreeset! {3}).await?; | ||
|
||
tracing::info!(log_index, "--- write 10 logs"); | ||
{ | ||
log_index += router.client_request_many(0, "0", 10).await?; | ||
for i in [0, 1, 2, 3] { | ||
router.wait(&i, timeout()).log(Some(log_index), format!("{} writes", 10)).await?; | ||
} | ||
} | ||
|
||
tracing::info!(log_index, "--- erase node 3 and restart"); | ||
{ | ||
let (_raft, ls, sm) = router.remove_node(3).unwrap(); | ||
{ | ||
let mut sto = ls.storage_mut().await; | ||
*sto = Arc::new(MemStore::new()); | ||
} | ||
router.new_raft_node_with_sto(3, ls, sm).await; | ||
router.add_learner(0, 3).await?; | ||
log_index += 1; // add learner | ||
} | ||
|
||
tracing::info!(log_index, "--- write another 10 logs, leader should not panic"); | ||
{ | ||
log_index += router.client_request_many(0, "0", 10).await?; | ||
for i in [0, 1, 2, 3] { | ||
router.wait(&i, timeout()).log(Some(log_index), format!("{} writes", 10)).await?; | ||
} | ||
} | ||
|
||
Ok(()) | ||
} | ||
|
||
fn timeout() -> Option<Duration> { | ||
Some(Duration::from_millis(1_000)) | ||
} |