Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat[plugins]: report queue depth per appId to prometheus #446

Merged
merged 9 commits into from
Oct 14, 2024

Conversation

678098
Copy link
Collaborator

@678098 678098 commented Oct 8, 2024

Introduced try-catch block for plugins:

10OCT2024_14:04:34.987 (139982138242624) ERROR bmqprometheus_prometheusstatconsumer.cpp:263 #PLUGIN_ERROR Invalid metric name
10OCT2024_14:04:35.987 (139982138242624) ERROR bmqprometheus_prometheusstatconsumer.cpp:263 #PLUGIN_ERROR Invalid metric name
10OCT2024_14:04:36.987 (139982138242624) ERROR bmqprometheus_prometheusstatconsumer.cpp:263 #PLUGIN_ERROR Invalid metric name

@678098 678098 requested a review from a team as a code owner October 8, 2024 13:28
@678098 678098 force-pushed the t2360_appId_queue_depth_prometheus branch from c8da61d to 332313a Compare October 8, 2024 13:30
@678098 678098 requested a review from waldgange October 8, 2024 13:31
Copy link

@bmq-oss-ci bmq-oss-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build 295 of commit 332313a has completed with FAILURE

for (DatapointDefCIter dpIt = bdlb::ArrayUtil::begin(defs);
dpIt != bdlb::ArrayUtil::end(defs);
++dpIt) {
const bsls::Types::Int64 value =
Copy link
Collaborator

@waldgange waldgange Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a bit confusing as we already have value in outer sope. Let's either reassign the existing variable or change the name of this one

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced it by introducing another for-loop for conflicting value.

@678098
Copy link
Collaborator Author

678098 commented Oct 9, 2024

@waldgange back to you, also note that I changed metric names:

  1. Use a common dot notation everywhere, so queue_gc_msgs -> queue.gc_msgs
  2. Renamed confusing metrics queue_content_msgs -> queue.content_msgs_max and the same with content_bytes (both of these metrics had implicit max value reported).

@678098 678098 requested a review from waldgange October 9, 2024 17:42
Signed-off-by: Evgeny Malygin <[email protected]>
@alexander-e1off
Copy link
Collaborator

@waldgange back to you, also note that I changed metric names:

  1. Use a common dot notation everywhere, so queue_gc_msgs -> queue.gc_msgs
  2. Renamed confusing metrics queue_content_msgs -> queue.content_msgs_max and the same with content_bytes (both of these metrics had implicit max value reported).

Prometheus does not support dot in metric names:

Metric names may contain ASCII letters, digits, underscores, and colons. It must match the regex [a-zA-Z_:][a-zA-Z0-9_:]*.

@waldgange
Copy link
Collaborator

@waldgange back to you, also note that I changed metric names:

  1. Use a common dot notation everywhere, so queue_gc_msgs -> queue.gc_msgs
  2. Renamed confusing metrics queue_content_msgs -> queue.content_msgs_max and the same with content_bytes (both of these metrics had implicit max value reported).

It's OK to change the suffixes, but metric names can't contain dots. This is a Prometheus-specific restriction, so we use underscores.
Moreover If you pass these names to prometheus-cpp it will raise an exception. And it will crash the broker.
Actually it does not look safe. I think it's worth wrapping the code in onSnapshot() method with a try/catch block to prevent the broker from crashing in future. Although metric names are constant, it's definitely possible to specify label names that also have the same restrictions.

@678098
Copy link
Collaborator Author

678098 commented Oct 10, 2024

@alexander-e1off @waldgange thanks, I will revert underscore change then.
I added a try-catch block around the code which operates metrics.
I am sure that it can throw std::invalid_argument, std::runtime_error or std::length_error, but to be more safe in the future, I catch a general exception type.

Copy link

@bmq-oss-ci bmq-oss-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build 303 of commit cea6f9c has completed with FAILURE

Copy link
Collaborator

@waldgange waldgange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +579 to +584
try {
(*it)->onSnapshot();
}
catch (const bsl::exception& e) {
BALL_LOG_ERROR << "#PLUGIN_ERROR " << e.what();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change! I was surprised we didn't have this wrapper before

@678098 678098 merged commit 07fa5d1 into bloomberg:main Oct 14, 2024
35 checks passed
@678098 678098 deleted the t2360_appId_queue_depth_prometheus branch October 14, 2024 16:24
alexander-e1off pushed a commit to alexander-e1off/blazingmq that referenced this pull request Oct 24, 2024
alexander-e1off pushed a commit to alexander-e1off/blazingmq that referenced this pull request Oct 24, 2024
alexander-e1off pushed a commit to alexander-e1off/blazingmq that referenced this pull request Oct 24, 2024
Signed-off-by: Christopher Beard <[email protected]>

fixing Solaris build (bloomberg#434)

Signed-off-by: dorjesinpo <[email protected]>

Remove `-DBMQ_ENABLE_MSG_GROUPID` from the build system

We do not ever want to build with this flag when releasing, and users
often manage to enable this flag accidentally.  Because message group
IDs are not fully implemented, we remove this temporary definition.  It
can be added in later if we ever come back to this feature.

Signed-off-by: Patrick M. Niedzielski <[email protected]>

Make unit tests pass without `BMQ_ENABLE_MSG_GROUPID`

The unit tests currently assume that message group IDs are enabled, and
since have updated our build system to no longer enable this feature,
the unit tests now fail in CI.  This patch guards the message group ID
tests with preprocessor conditionals, disabling the parts of tests that
try to set and check message group IDs.  When `BMQ_ENABLE_MSG_GROUPID`
is set, these parts of the unit tests run again.

Signed-off-by: Patrick M. Niedzielski <[email protected]>

Fix mqbstat doc formatting (bloomberg#438)

Signed-off-by: Christopher Beard <[email protected]>

Fix[bmqeval]: limit expression length to avoid stack overflow (bloomberg#441)

Signed-off-by: Evgeny Malygin <[email protected]>

Fix Solaris unit tests (bloomberg#440)

Signed-off-by: Anton Pryakhin <[email protected]>

Docs[BMQ]: Use `.dox` files rather than `.md` files

Package group documentation in `libbmq` was converted to Markdown files
named `README.md`, and which was tied to the directory containing the
code for the package group using Doxygen `@dir` commands.  However, when
generating the documentation, this left several empty pages in the
documentation named `README`, which we were not able to remove.

The solution for this that this patch uses is to switch from `.md` files
to `.dox` files, which contain a single Doxygen-style C++ comment
containing the `@dir` command.  Unlike `.md` files, these do not
automatically create pages, so there is no empty `README` page created
for each package group.  The cost of this is that `.dox` files cannot be
simple Markdown files, but instead need to be wrapped in a C++ comment.

Signed-off-by: Patrick M. Niedzielski <[email protected]>

Docs[BMQ] bde -> doxygen conversion fixes (bloomberg#443)

* Doc[BMQT] minor bde -> doxygen docs

* Doc[BMQA] minor bde -> doxygen docs

* Doc[BMQA] re-wrap data member comments

* Doc[BMQT] re-wrap data member comments

* Apply suggestions from code review

---------

Signed-off-by: Christopher Beard <[email protected]>
Signed-off-by: Chris Beard <[email protected]>
Co-authored-by: Evgeny Malygin <[email protected]>

Feat: track queue depth per appId (bloomberg#320)

Signed-off-by: Evgeny Malygin <[email protected]>

configurator, bmqit: mode protos (bloomberg#447)

Signed-off-by: Jean-Louis Leroy <[email protected]>

Revert "configurator, bmqit: mode protos (bloomberg#447)" (bloomberg#449)

This reverts commit a4b20db.

Fix[mqbs_virtualstoragecatalog.cpp]: fix Solaris build (bloomberg#450)

Signed-off-by: Evgeny Malygin <[email protected]>

fix: configurator: apply app ids (bloomberg#452)

Signed-off-by: Jean-Louis Leroy <[email protected]>

Fix [MQB]: mqbc::StorageMgr: Transition to available only when all primary active (bloomberg#416)

* mqbc::StorageMgr: Ban 'processPrimaryStatusAdvisory' in non-FSM mode

Signed-off-by: Yuan Jing Vincent Yan <[email protected]>

* mqbc::StorageMgr: Transition to available only when all primary active

Signed-off-by: Yuan Jing Vincent Yan <[email protected]>

* mqbc::StorageMgr: clang-format

Signed-off-by: Yuan Jing Vincent Yan <[email protected]>

* mqbc::StorageMgr: Healing replica buffers primary status advisories

Signed-off-by: Yuan Jing Vincent Yan <[email protected]>

* mqbs::FileStore: Rename setPrimary -> setActivePrimary

Signed-off-by: Yuan Jing Vincent Yan <[email protected]>

* mqbc::StorageMgr: Comment about check if all partitions available

Signed-off-by: Yuan Jing Vincent Yan <[email protected]>

---------

Signed-off-by: Yuan Jing Vincent Yan <[email protected]>

Fix some compiler warnings in mqb (bloomberg#455)

* -Wunused-parameter
* -Wshadow
* -Wswitch-enum

Signed-off-by: Christopher Beard <[email protected]>

It: Include full path for admin stat it test failures (bloomberg#453)

* It: Include full path for admin stat it test failures

This patch makes it a little easier to debug the metric & operation that
causes an integration test for stats to fail.

Signed-off-by: Christopher Beard <[email protected]>

* Update src/integration-tests/test_admin_client.py

Co-authored-by: Evgeny Malygin <[email protected]>
Signed-off-by: Chris Beard <[email protected]>

---------

Signed-off-by: Christopher Beard <[email protected]>
Signed-off-by: Chris Beard <[email protected]>
Co-authored-by: Evgeny Malygin <[email protected]>

Feat: Add queue history size metric (bloomberg#436)

* [WIP] Feat: Add queue history size metric

This adds a new queue metric that counts the number of GUIDs in that
queue's history. This is useful for identifying excessive memory
utilization from history and potential history garbage collection issues
(where history is filled up faster than it's cleaned up).

Signed-off-by: Christopher Beard <[email protected]>

* It: Extend admin it for history size stat

Signed-off-by: Christopher Beard <[email protected]>

---------

Signed-off-by: Christopher Beard <[email protected]>

Feat[plugins]: report queue depth per appId to prometheus (bloomberg#446)

Signed-off-by: Evgeny Malygin <[email protected]>

[Fix] m_bmqstoragetool::FileManagerImpl: Asserts not have side effects (bloomberg#461)

Signed-off-by: Yuan Jing Vincent Yan <[email protected]>

Feat[MQB]: Enhance queue consumption monitor alarm log with additional details (bloomberg#420)

Enhance filebackedstorage alarm log

Signed-off-by: Aleksandr Ivanov <[email protected]>

Cleanup

Signed-off-by: Aleksandr Ivanov <[email protected]>

Add test to mqbu_capacitymeter.t

Signed-off-by: Aleksandr Ivanov <[email protected]>

mqbc::StorageUtil, mqbi::StorageMgr: updateQueue -> updateQueuePrimary (bloomberg#466)

Signed-off-by: Yuan Jing Vincent Yan <[email protected]>

Fix[MQB]: misc warnings (bloomberg#464)

Allow dots in subscription property names

Message properties allow arbitrary strings for property names, but our
subscription expression language is more limited, requiring an initial
alphabetic character followed by any number of alphanumeric characters
and underscores.  Some producers have begun using hierarchical message
property names, separated by dots (“.”), and are unable to use
subscriptions to filter or route according to these message properties.

This patch extends the expression language grammar to enable matching on
subscription property names with dots in them.  This change is a pure
extension: the language recognized by the subscription expression grammar
after this patch is a strict superset of the language recognized by the
subscription expression grammar before this patch.  This patch also
extends the unit test for the lexer to ensure this is a strict superset.

Signed-off-by: Patrick M. Niedzielski <[email protected]>

fix: clean app subscriptions on reconfigure

Signed-off-by: dorjesinpo <[email protected]>

Fix[mqbstat_domainstats.cpp]: empty tier StringRef (bloomberg#431)

Signed-off-by: Evgeny Malygin <[email protected]>

Fix Solaris build, it does not support ctor delegation

Signed-off-by: Aleksandr Ivanov <[email protected]>

Doc: Document app subscriptions (bloomberg#463)

* Docs upgrade jekyll -> 4.3.3

Signed-off-by: Christopher Beard <[email protected]>

* Docs: Document app subscriptions

Signed-off-by: Christopher Beard <[email protected]>

* Expand on difference in subscriptions

Signed-off-by: Christopher Beard <[email protected]>

* Minor subscription doc clarifications

Signed-off-by: Christopher Beard <[email protected]>

* Elaborate on subscription details

Signed-off-by: Christopher Beard <[email protected]>

* Clarify consumer subscription on broker

Signed-off-by: Christopher Beard <[email protected]>

---------

Signed-off-by: Christopher Beard <[email protected]>

fix: enhanced detection of duplciate PUSHes (bloomberg#472)

Signed-off-by: dorjesinpo <[email protected]>

Fix ntf-core version in build_darwin.sh

Signed-off-by: Aleksandr Ivanov <[email protected]>

Add logAppsSubscriptionInfoCb into InMemoryStorage

Signed-off-by: Aleksandr Ivanov <[email protected]>

Add IT for capacity meter enhanced log

Signed-off-by: Aleksandr Ivanov <[email protected]>

Fix comments

Signed-off-by: Aleksandr Ivanov <[email protected]>

Fix [CI] ntf-core version for macosx build (bloomberg#473)

Merge mwc into bmq

MWC or "MiddleWare Core" was a package group developed to support
a myriad of applications at Bloomberg. It's been useful to share
common middleware components between similar technologies, but doesn't
make much sense to support as its own open source library. Moving
forward we are merging it into the BMQ package group to better maintain
it for the BlazingMQ project.

Signed-off-by: Taylor Foxhall <[email protected]>

Fix conflict

Signed-off-by: Aleksandr Ivanov <[email protected]>

Fix conflict

Signed-off-by: Aleksandr Ivanov <[email protected]>

Fix mwctst

Signed-off-by: Aleksandr Ivanov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants