Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat[Storagetool]: Add searching by record sequence numbers and offsets #508

Merged
merged 27 commits into from
Dec 20, 2024

Conversation

alexander-e1off
Copy link
Collaborator

@alexander-e1off alexander-e1off commented Nov 11, 2024

Enhance storage tool to search by composite sequence numbers (primaryLeaseId, sequenceNumber) and offsets:

  • Search up to a given sequence number or offset;
  • Search starting from a given sequence number or offset;
  • Search specific sequence number(s) or offsets;

Below are examples how to use new feature:

Filter messages within composite sequence numbers (primaryLeaseId, sequenceNumber) range:

bmqstoragetool --journal-file=<path> --seqnum-lt=<leaseId-sequenceNumber>
bmqstoragetool --journal-file=<path> --seqnum-gt=<leaseId-sequenceNumber>
bmqstoragetool --journal-file=<path> --seqnum-lt=<leaseId1-sequenceNumber1> --seqnum-gt=<leaseId2-sequenceNumber2>

Filter messages within record offsets range:

bmqstoragetool --journal-file=<path> --offset-lt=<offset>
bmqstoragetool --journal-file=<path> --offset-gt=<offset>
bmqstoragetool --journal-file=<path> --offset-lt=<offset1> --offset-gt=<offset2>

Filter messages with corresponding composite sequence numbers (defined in form primaryLeaseId-sequenceNumber):

bmqstoragetool --journal-file=<path> --seqnum=<leaseId-sequenceNumber_1> --seqnum=<leaseId-sequenceNumber_N>

Filter messages with corresponding record offsets:

bmqstoragetool --journal-file=<path> --offset=<offset_1> --offset=<offset_N>

@alexander-e1off alexander-e1off changed the title [Storagetool] Add searching by record sequence numbers and offsets Feat [Storagetool]: add searching by record sequence numbers and offsets Nov 11, 2024
@alexander-e1off alexander-e1off changed the title Feat [Storagetool]: add searching by record sequence numbers and offsets Feat[Storagetool]: Add searching by record sequence numbers and offsets Nov 12, 2024
Comment on lines 42 to 70
template <typename T>
T getValue(const mqbs::JournalFileIterator* jit,
const Parameters::SearchValueType valueType);

template <>
bsls::Types::Uint64 getValue(const mqbs::JournalFileIterator* jit,
const Parameters::SearchValueType valueType)
{
// PRECONDITIONS
BSLS_ASSERT(jit);
BSLS_ASSERT(valueType == Parameters::e_TIMESTAMP ||
valueType == Parameters::e_OFFSET);

return (valueType == Parameters::e_TIMESTAMP)
? jit->recordHeader().timestamp()
: jit->recordOffset();
}

template <>
CompositeSequenceNumber getValue(const mqbs::JournalFileIterator* jit,
const Parameters::SearchValueType valueType)
{
// PRECONDITIONS
BSLS_ASSERT(jit);
BSLS_ASSERT(valueType == Parameters::e_SEQUENCE_NUM);

return CompositeSequenceNumber(jit->recordHeader().primaryLeaseId(),
jit->recordHeader().sequenceNumber());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd better make a comparator function and pass it to moveToLowerBound. It can significantly simplify the logic:

bool comparatorFunction(const mqbs::JournalFileIterator&  jit,
                                         const Parameters& parameters)
{
  if (valueType == Parameters::e_TIMESTAMP) {
    return jit.recordHeader().timestamp() > parameters.d_timestampGt;
  } else if () { ... } esle { ... }
}

This comment was marked as resolved.

Comment on lines 80 to 83
template <typename T>
int moveToLowerBound(mqbs::JournalFileIterator* jit,
const Parameters::SearchValueType valueType,
const T& valueGt)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int moveToLowerBound(mqbs::JournalFileIterator*                             jit,
                     bsl::function<bool(const mqbs::JournalFileIterator&)>& comparator))

This comment was marked as resolved.

Comment on lines 186 to 196
if (d_parameters->d_valueGt > 0) {
rc = moveToLowerBound<bsls::Types::Uint64>(
iter,
d_parameters->d_valueType,
d_parameters->d_valueGt);
}
else {
rc = moveToLowerBound<CompositeSequenceNumber>(
iter,
d_parameters->d_valueType,
d_parameters->d_seqNumGt);

This comment was marked as resolved.

Comment on lines 124 to 131
bsls::Types::Uint64 d_valueGt;
// Filter messages greater than value
bsls::Types::Uint64 d_valueLt;
// Filter messages less than value
CompositeSequenceNumber d_seqNumGt;
// Filter messages greater than sequence number
CompositeSequenceNumber d_seqNumLt;
// Filter messages less than sequence number
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks confusing: There are 3 pairs of parameters, we store two of them in this struct and use only one pair. So let's keep all the three. We may need them in the future if our customers ask us to enable combined filtering. Moreover it will be really easy to implement.

This comment was marked as resolved.


inline bool CompositeSequenceNumber::isUnset() const
{
return d_isUnset;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are expressions like if (!isUnset()) {...} everywhere. Double negation looks overcomplicated to me. And furthermore nobody really "unsets" this object)). IsSet or IsInitialized would be much more readable.

This comment was marked as resolved.

@alexander-e1off alexander-e1off marked this pull request as ready for review November 19, 2024 10:19
@alexander-e1off alexander-e1off requested a review from a team as a code owner November 19, 2024 10:19
@alexander-e1off alexander-e1off force-pushed the storagetool-add-seqnum branch 2 times, most recently from 7790015 to eae1cb5 Compare November 26, 2024 08:08
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Copy link
Collaborator

@pniedzielski pniedzielski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small changes, mostly typos. LGTM otherwise

Comment on lines 27 to 28
[--seqnum-gt <composit sequence number greater than>]
[--seqnum-lt <composit sequence number less than>]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[--seqnum-gt <composit sequence number greater than>]
[--seqnum-lt <composit sequence number less than>]
[--seqnum-gt <composite sequence number greater than>]
[--seqnum-lt <composite sequence number less than>]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines 63 to 66
--seqnum-gt <composit sequence number greater than>
lower composit sequence number bound, defined in form <leaseId-sequenceNumber>, e.g. 123-456
--seqnum-lt <composit sequence number less than>
higher composit sequence number bound, defined in form <leaseId-sequenceNumber>, e.g. 123-456
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--seqnum-gt <composit sequence number greater than>
lower composit sequence number bound, defined in form <leaseId-sequenceNumber>, e.g. 123-456
--seqnum-lt <composit sequence number less than>
higher composit sequence number bound, defined in form <leaseId-sequenceNumber>, e.g. 123-456
--seqnum-gt <composite sequence number greater than>
lower composite sequence number bound, defined in form <leaseId-sequenceNumber>, e.g. 123-456
--seqnum-lt <composite sequence number less than>
higher composite sequence number bound, defined in form <leaseId-sequenceNumber>, e.g. 123-456

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

const m_bmqstoragetool::CompositeSequenceNumber& rhs)
{
// PRECONDITIONS
BSLS_ASSERT(lhs.isSet() && rhs.isSet());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If CompositeSequenceNumber is value-semantic as documented, we probably want CompositeSequenceNumber() == CompositeSequenceNumber() to be true (for putting them in unordered containers, so that a = b implies a == b, etc). Similarly for operator< below. I suggest the ordering

CompositeSequenceNumber() < CompositeSequenceNumber(0, 0) < CompositeSequenceNumber(0, 1) < ... < CompositeSequenceNumber(1, 0) < CompositeSequenceNumber(1, 1) < ...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the above, this type becomes bsl::optional<bsl::pair<unsigned int, bsl::bsls::Types::Uint64> > with more interesting print and fromString functions. I think it makes sense to pull the bsl::optional part out and make this type not know about isSet.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I was just inspired by MessageGUID class which has isUnset() and related logic. I propose to address your proposal in separate PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me!

```bash
bmqstoragetool --journal-file=<path> --seqnum=<leaseId-sequenceNumber_1> --seqnum=<leaseId-sequenceNumber_N>
```
NOTE: no other filters are allowed with this one
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems reasonable to want to look at journal records with a particular primary lease ID that happened after a particular timestamp, no?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See earlier comment, let's make this change later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the first step we agreed to make these filters mutual exclusive, and then change behaviour depending on user feedback. Agree to make this change later.

(d_timestampLt > 0 && ts >= d_timestampLt)) {
// Match by timestamp

// Apply `range` filter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to store the timestamp range, the offset range, and the composite sequence number range, what limits us to having only one type of range? E.g.,

bsls::Types::Uint64 timestamp = record.header().timestamp();
if ((d_range.d_timestampGt && timestamp <= *d_range.d_timestampGt) || 
    (d_range.d_timestmapLt && timestamp >= *d_range.d_timestampLt)) {
    return false;
}

if ((d_range.d_offsetGt && offset <= *d_range.d_offsetGt) || 
    (d_range.d_offsetLt && offset >= *d_range.d_offsetLt)) {
    return false;
}

CompositeSequenceNumber seqNum(record.header().primaryLeaseId(),
                               record.header().sequenceNumber();
if ((d_range.d_seqNumGt && seqNum <= *d_range.d_seqNumGt) || 
    (d_range.d_seqNumLt && seqNum >= *d_range.d_seqNumLt)) {
    return false;
}

on the assumption that d_range members become bsl::optionals. This would seem to make the code both more simple and more flexible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, let's save this for a later PR---it's a pretty simple change once this is in.

Comment on lines 126 to 137
bsls::Types::Uint64 d_timestampGt;
// Filter messages greater than timestamp value
bsls::Types::Uint64 d_timestampLt;
// Filter messages less than timestamp value
bsls::Types::Uint64 d_offsetGt;
// Filter messages greater than offset value
bsls::Types::Uint64 d_offsetLt;
// Filter messages less than offset value
CompositeSequenceNumber d_seqNumGt;
// Filter messages greater than sequence number
CompositeSequenceNumber d_seqNumLt;
// Filter messages less than sequence number
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these should probably be bsl::optionals.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be addressed in separate PR later.

private:
// PRIVATE DATA
bsl::vector<CompositeSequenceNumber> d_seqNums;
// List of composite sequence numbers to search for.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the code uses this BDE-style format docs for data members, but it's because the conversion script did not catch them. I'm slowly converting this over, but it's a manual process (package by package in #523). For new code, let's use doxygen style for everything (/// before the members and Markdown).

Not a big deal, but less to change later.

// ACCESSORS

bool stop(const CompositeSequenceNumber& sequenceNumber) const;
// Return 'true' if the specified 'sequenceNumber' is greater than
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doxygen format docs here.

bool stop(bsls::Types::Uint64 timestamp) const;
// Return 'true' if the specified 'timestamp' is greated than
bool stop(const bsls::Types::Uint64 timestamp) const;
// Return 'true' if the specified 'timestamp' is greater than
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you're keeping the existing doc format here, which makes sense, but let's take this opportunity to convert this into Doxygen format---probably should have been from the beginning.

@alexander-e1off
Copy link
Collaborator Author

@pniedzielski thanks for your review! I've addressed typos and doxygen notes, other notes will be addressed in further PRs.

@pniedzielski pniedzielski self-requested a review December 20, 2024 16:24
Copy link
Collaborator

@pniedzielski pniedzielski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will squash-merge this in.

@pniedzielski pniedzielski merged commit 6b34d52 into bloomberg:main Dec 20, 2024
60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants