Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix get_dataset_properties to avoid propagating inherited UUIDs #7232

Merged
merged 2 commits into from
Dec 11, 2024

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Dec 11, 2024

Fixes #7231

Rather than relying on zfs list, uses zfs get and parses the source field to decide how properties should be used.

This method is used when reporting inventory.

@smklein smklein requested a review from jgallagher December 11, 2024 01:28
}

#[test]
fn parse_dataset_uuid_ignored_if_inherited() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test explicitly covers the case we are observing from inventory right now.

Because this UUID is inherited, it should be ignored.

Copy link
Contributor

@jgallagher jgallagher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Linux test failure looks like a flake, but I think the helios one is caused by changes in this PR (somehow? I don't understand how though!):

6753	2024-12-11T02:43:27.720Z	--- STDERR:              sled-storage manager::tests::nested_dataset ---
6754	2024-12-11T02:43:27.720Z	log file: /var/tmp/omicron_tmp/sled_storage-824eb76ba927d558-nested_dataset.27636.0.log
6755	2024-12-11T02:43:27.720Z	note: configured to log to "/var/tmp/omicron_tmp/sled_storage-824eb76ba927d558-nested_dataset.27636.0.log"
6756	2024-12-11T02:43:27.720Z	thread 'manager::tests::nested_dataset' panicked at sled-storage/src/manager.rs:2021:68:
6757	2024-12-11T02:43:27.720Z	called `Result::unwrap()` on an `Err` value: Other(Failed to parse UUID
6758	2024-12-11T02:43:27.721Z	
6759	2024-12-11T02:43:27.721Z	Caused by:
6760	2024-12-11T02:43:27.721Z	    0: error parsing UUID (dataset)
6761	2024-12-11T02:43:27.721Z	    1: invalid group count: expected 5, found 2

};
let mut datasets: BTreeMap<&str, BTreeMap<&str, _>> = BTreeMap::new();
for name_prop_val_source in name_prop_val_source_list {
let mut iter = name_prop_val_source.split_whitespace();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this particular bit didn't change in this PR, but splitting on whitespace seems a little fishy (although having to parse CLI output is probably going to be a little fishy no matter what). I was going to ask if columns could themselves have whitespace, but it looks like that's already true: the source isn't just "inherited"; it's really "inherited from ...", and we just happen to be able to treat it as inherited because we're ignoring any extra data after the first space in the source column.

The man page says -H emits tab-delimited columns. Could we use that to make this more robust? Something like:

  • Split on \t specifically instead of any whitespace
  • Confirm there are no unexpected columns (return an error if iter.next() returns another string, I think?) after confirming the columns we do expect
  • Fix up the places where we're relying on whitespace parsing to shorten the value (source would now start with inherited from; I don't see any other places but might be missing something not in the diff?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seems totally reasonable to me. I also felt a little gross about this (tried to cover both cases in tests) but splitting on tab explicitly seems better.

impl FromStr for DatasetProperties {
type Err = anyhow::Error;
let props = datasets.entry(name).or_default();
props.insert(prop, (val, source));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the dedup test: should this be paranoid and confirm that if there's already an entry for prop, the value and source match exactly, failing otherwise? I assume that should never happen.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Soooo I'm actually not 100% sure that this is impossible. I don't know if these values are atomic -- if someone queries for a dataset multiple times, and the value of e.g. "bytes used by dataset" changes while we're gathering the data, I think it's possible that the value changes?

The code as-is grabs the latest value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mm fair point. 👍

Comment on lines 279 to 291
let id = match props.get("oxide:uuid") {
Some((prop, source)) => {
if *source == "inherited" {
None
} else {
Some(
prop.parse::<DatasetUuid>()
.context("Failed to parse UUID")?,
)
}
}
None => None,
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly shorter / more consistent with the other properties below, but maybe slightly too clever (abusing the fact that Option is an iterator so we can call .filter()), so feel free to take it or leave it:

let id = props
    .get("oxide:uuid")
    .filter(|(_, source)| !source.starts_with("inherited"))
    .map(|(prop, _)| prop.parse::<DatasetUuid>().context("Failed to parse UUID"))
    .transpose()?;

Could use the same style for the other properties that have "default" values that should be ignored below, too. I wonder if it'd be worth it to wrap props in some type that has a get_unless("property", |source| ...) helper method for these properties?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, happy to update this to use filter, here and below. I don't like mixing map and match either, but finding the right magical methods to pass around errors with map is sometimes a struggle

Comment on lines 417 to 419
"-d",
"1",
"-rHpo",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  • -d 1 gives the properties for the dataset in question and any of its children. Is that intentional, or should we drop both that and -r?
  • If it is intentional, can we add a comment explaining why and drop the -r? The man page says -d N makes it recursive up to the given depth.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I think the -r is not necessary. I'll add docs to the function itself.

@smklein
Copy link
Collaborator Author

smklein commented Dec 11, 2024

The Linux test failure looks like a flake, but I think the helios one is caused by changes in this PR (somehow? I don't understand how though!):

6753	2024-12-11T02:43:27.720Z	--- STDERR:              sled-storage manager::tests::nested_dataset ---
6754	2024-12-11T02:43:27.720Z	log file: /var/tmp/omicron_tmp/sled_storage-824eb76ba927d558-nested_dataset.27636.0.log
6755	2024-12-11T02:43:27.720Z	note: configured to log to "/var/tmp/omicron_tmp/sled_storage-824eb76ba927d558-nested_dataset.27636.0.log"
6756	2024-12-11T02:43:27.720Z	thread 'manager::tests::nested_dataset' panicked at sled-storage/src/manager.rs:2021:68:
6757	2024-12-11T02:43:27.720Z	called `Result::unwrap()` on an `Err` value: Other(Failed to parse UUID
6758	2024-12-11T02:43:27.721Z	
6759	2024-12-11T02:43:27.721Z	Caused by:
6760	2024-12-11T02:43:27.721Z	    0: error parsing UUID (dataset)
6761	2024-12-11T02:43:27.721Z	    1: invalid group count: expected 5, found 2

Yup, this is a real failure - I need to parse the value "-" as "None" -- added a test for this, and it's passing now.

@smklein smklein enabled auto-merge (squash) December 11, 2024 20:10
@smklein smklein merged commit dcc0df3 into main Dec 11, 2024
16 checks passed
@smklein smklein deleted the fix-uuid-parsing branch December 11, 2024 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inventory displaying incorrect value for transient zone filesystem datasets
2 participants