Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use data container wrapper for mutable metadata #559

Merged
merged 5 commits into from
Oct 11, 2024

Conversation

nathanielc
Copy link
Collaborator

@nathanielc nathanielc commented Oct 8, 2024

Existing MID events have a header on data payloads. This data is mutable metadata. We needed a mechanism to represent this kind of data within the conclusion events and doc states. We added a simple wrapper type with metadata and data fields.

Fixes AES-368

@nathanielc nathanielc requested a review from a team as a code owner October 8, 2024 20:28
@nathanielc nathanielc requested review from stephhuynh18, samika98 and stbrody and removed request for a team October 8, 2024 20:28
@nathanielc nathanielc temporarily deployed to github-tests-2024 October 8, 2024 20:47 — with GitHub Actions Inactive
@nathanielc nathanielc temporarily deployed to github-tests-2024 October 9, 2024 16:10 — with GitHub Actions Inactive
@nathanielc nathanielc temporarily deployed to github-tests-2024 October 9, 2024 17:20 — with GitHub Actions Inactive
Copy link

linear bot commented Oct 9, 2024

Copy link
Collaborator

@stbrody stbrody left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main question is around how to handle data events that don't explicitly change should_index.

Error::new_app(anyhow::anyhow!(
"Malformed event found in the database: {}",
e
))
})?;

// Small wrapper container around the data field to hold other mutable metadata for the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason we're defining this type inside the body of the (already long) transform_raw_events_to_conclusion_events function? Can we pull it out so the function is easier to read?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only used within this context so I placed it here inside the function. I can look into how to shorten this function to make it easier to read.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just make it file-local instead of function-local?

event/src/unvalidated/payload/data.rs Outdated Show resolved Hide resolved
event-svc/src/event/service.rs Outdated Show resolved Hide resolved
olap/src/aggregator/ceramic_patch.rs Show resolved Hide resolved
@nathanielc
Copy link
Collaborator Author

@stbrody @samika98 Good call on making sure we are precise on the behavior when the metadata key is missing in the data header. The PR has been updated to transitively carry forward the metadata map and only if its present change update it. The aggregation reproduces the known state of the metadata for each new event so that clients do not have to do that work, which is aligned with the purpose of the aggregator.

Existing MID events have a header on data payloads. This data is mutable
metadata. We needed a mechanism to represent this kind of data within
the conclusion events and doc states. We added a simple wrapper type
with metadata and data fields.
Instead of expecting each data event header to repeat the metadata we
carry forward the metadata from the previous state and only if its
present do we apply a change to the metadata. Tests have been updated to
reflect this new behavior.
@nathanielc nathanielc temporarily deployed to github-tests-2024 October 11, 2024 19:57 — with GitHub Actions Inactive
Copy link
Collaborator

@stbrody stbrody left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Error::new_app(anyhow::anyhow!(
"Malformed event found in the database: {}",
e
))
})?;

// Small wrapper container around the data field to hold other mutable metadata for the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just make it file-local instead of function-local?


#[derive(Debug)]
struct CeramicPatchEvaluator;

impl CeramicPatchEvaluator {
fn apply_patch(patch: &str, previous_state: &str) -> Result<String> {
let patch: Vec<json_patch::PatchOperation> = serde_json::from_str(patch)
let patch: MIDDataContainer<Vec<PatchOperation>> = serde_json::from_str(patch)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this going to work given that in the events themselves the field is called header but in the state it's called metadata?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit surprising to me to see that we're using the same type (MIDDataContainer) for both events and states. If it works that's fine, just want to make sure we don't have any issues from overloading the type like that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry about the confusion. The field is always called metadata in the context that this struct is used.

The translation from raw event to conclusion event renames the field from header to metadata. (This was intentional as most fields inside the header end up as dimensions and only shouldIndex ends up in the metadata).

I see the confusion that a conclusion event data field has the same structure as an aggregated state data. I'll consider using two different (yet similar types) to make it clear.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided the structures are the same so a good use for generics, with some type aliases to make it clear the context in how they should be used.

struct MIDDataContainer<D> {
    metadata: BTreeMap<String, serde_json::Value>,
    data: D,
}

type MIDDataContainerPatch = MIDDataContainer<Vec<PatchOperation>>;
type MIDDataContainerState = MIDDataContainer<serde_json::Value>;

@nathanielc nathanielc enabled auto-merge October 11, 2024 21:06
@nathanielc nathanielc temporarily deployed to github-tests-2024 October 11, 2024 21:28 — with GitHub Actions Inactive
@nathanielc nathanielc added this pull request to the merge queue Oct 11, 2024
Merged via the queue into main with commit aad49de Oct 11, 2024
5 checks passed
@nathanielc nathanielc deleted the feat/should-index branch October 11, 2024 22:08
@smrz2001 smrz2001 mentioned this pull request Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants