Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/sqlserver] Enable direct connection to SQL Server instances #31915

Merged
merged 2 commits into from
Apr 23, 2024

Conversation

crobert-1
Copy link
Member

@crobert-1 crobert-1 commented Mar 22, 2024

Description:

This PR enables direct connection to SQL Server instances, and also allows the SQL Server receiver to run on non-Windows platforms.

User-facing changes:

  1. More (optional) configuration options. If the receiver wants to connect directly to the SQL Server instance (rather than relying on Windows Performance Counters), all of these options must be specified.
  2. Add a new metric, sqlserver.database_io.read_latency. This is disabled by default, and way randomly chosen as a single metric to try to restrict this PR's size. I plan on following this PR up with more metrics from the same query, as well as more queries and metrics.

Note: Existing users should see no change in functionality with this PR. Using existing configurations with this PR should have no performance or functional impact.

Link to tracking Issue:
Resolves #30297

Testing:
Tested on MacOS and Windows. Manual tested with direct connection to a docker container on both platforms, as well as ran all tests in package. Everything is passing.

Working demo for testing:
Docker container start command:

docker run -e "ACCEPT_EULA=Y" -e "MSSQL_SA_PASSWORD=StrongPassw0rd" -p 1433:1433 -d mcr.microsoft.com/mssql/server:2022-latest

Collector config:

sqlserver:
 password: "StrongPassw0rd"
 port: 1433
 server: 0.0.0.0
 username: sa
 resource_attributes:
   sqlserver.instance.name:
     enabled: true
   sqlserver.computer.name:
     enabled: true
 metrics:
   sqlserver.database_io.read_latency:
     enabled: true

Documentation:
Updated README

Copy link
Member Author

@crobert-1 crobert-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just left some comments to hopefully help give thought process behind some changes.

@@ -14,29 +14,45 @@
[sumo]: https://github.com/SumoLogic/sumologic-otel-collector
<!-- end autogenerated section -->

The `sqlserver` receiver grabs metrics about a Microsoft SQL Server instance using the Windows Performance Counters.
Because of this, it is a Windows only receiver.
The `sqlserver` receiver grabs metrics about a Microsoft SQL Server instance. The receiver works by either using the
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of the complexity of this PR is enabling support on the non-Windows based OSs, while trying to keep Windows-specific functionality unchanged. I attempted to keep logic that was specific to windows in _windows files, logic specific to non-windows in _others files, and both in the generically named files. This leads to a bit of a challenge trying to follow some code, but I don't know of a better way given the nature of this receiver.

Happy to take suggestions on how to make this cleaner if anyone has any.


Make sure to run the collector as administrator in order to collect all performance counters for metrics.

## Configuration

The following settings are optional:
- `collection_interval` (default = `10s`): The internal at which metrics should be emitted by this receiver.
- `instance_name` (optional): The instance name identifies the specific SQL Server instance being monitored.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to allow query-filtering based on instance name and not computer name as I believe instance name makes more sense for this. If anyone wants both, or only computer name I'm happy to discuss. It would be a relatively simple feature/enhancement.

- `server`: IP Address or hostname of SQL Server instance to connect to.
- `port`: Port of the SQL Server instance to connect to.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

server and port here are separate options for the sake of the SQL db connection string, which requires them to be passed in separately. I thought this makes more sense than the alternative of a single option, and then parsing internally. Happy to discuss if anyone prefers the alternative.

Comment on lines 34 to 37
if !directDBConnectionEnabled(cfg) {
if cfg.Server != "" || cfg.Username != "" || string(cfg.Password) != "" {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't necessarily have to return an error here, but I want to let the user know up front there may be something wrong or incomplete. There's no use-case for specifying some, but not all, of the connection options, and I could see it being a source of confusion if a isn't seeing metrics because they unknowingly forgot a configuration option.

package sqlserverreceiver // import "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/sqlserverreceiver"

func (cfg *Config) validateInstanceAndComputerName() error {
return nil
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simply to provide an alternative to the Windows-specific functionality that requires either both options to be set, or none. There's no such limitation on the non-Windows side.

receiver/sqlserverreceiver/queries.go Outdated Show resolved Hide resolved
FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs
INNER JOIN sys.master_files AS mf WITH (NOLOCK)
ON vfs.[database_id] = mf.[database_id] AND vfs.[file_id] = mf.[file_id]
%s'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As shown in getSQLServerDatabaseIOQuery, this allows us to filter based on instance name.

recorder recordFunc
var _ scraperhelper.Scraper = (*SQLServerScraperHelper)(nil)

func NewScraper(id component.ID,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially hoping to directly use internal/sqlquery's scraper functionality, but it wouldn't have access to the sqlserver receiver's metadata.yaml, and metric emitting functionality. It simply returns a set of metrics, with no way for us to use sqlserver metadata on it, AFAIK.

Copy link
Member Author

@crobert-1 crobert-1 Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is largely copied code from scraper.go, it's not new code. The only change I'm aware of is the rename of the sqlServerScraper.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from scraper_test.go, it's not new.

@crobert-1 crobert-1 force-pushed the sqlserver_direct_connect branch from 421e648 to 0851a28 Compare March 22, 2024 20:07
)

// SQL Server Perf Counter (PC) Scraper. This is used to scrape metrics from Windows Perf Counters.
type sqlServerPCScraper struct {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming feedback on the two types of SQL Server scrapers would be helpful. I chose the direct connection scraper to have the default name since it's platform agnostic, and the Windows-specific scraper to have a non-default name.

@crobert-1 crobert-1 added the Run Windows Enable running windows test on a PR label Mar 22, 2024
receiver/sqlserverreceiver/README.md Outdated Show resolved Hide resolved
receiver/sqlserverreceiver/config_windows.go Outdated Show resolved Hide resolved
receiver/sqlserverreceiver/metadata.yaml Outdated Show resolved Hide resolved
Copy link
Member

@dmitryax dmitryax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the overlap between metrics collected in different modes? Can the direct connection bring the same metrics as currently reported with perf counters?

receiver/sqlserverreceiver/metadata.yaml Outdated Show resolved Hide resolved
@crobert-1 crobert-1 force-pushed the sqlserver_direct_connect branch from 0317752 to 792e8e7 Compare March 26, 2024 15:54
@crobert-1
Copy link
Member Author

What's the overlap between metrics collected in different modes? Can the direct connection bring the same metrics as currently reported with perf counters?

All metrics reported by perf counters can be gathered from direct connection. Here's the breakdown of doc references for how to get all existing metrics from querying SQL Server:

SQL Server, SQL statistics object

sqlserver.batch.request.rate
sqlserver.batch.sql_compilation.rate
sqlserver.batch.sql_recompilation.rate

SQL Server, Transactions object

sqlserver.transaction.rate

SQL Server, Locks object

sqlserver.lock.wait.rate
sqlserver.lock.wait_time.avg

SQL Server, Databases object

sqlserver.transaction.write.rate
sqlserver.transaction_log.flush.data.rate
sqlserver.transaction_log.flush.rate
sqlserver.transaction_log.flush.wait.rate
sqlserver.transaction_log.usage
sqlserver.transaction_log.shrink.count
sqlserver.transaction_log.growth.count

SQL Server, General Statistics object

sqlserver.user.connection.count

SQL Server, Buffer Manager object

sqlserver.page.buffer_cache.hit_ratio
sqlserver.page.life_expectancy
sqlserver.page.lazy_write.rate
sqlserver.page.checkpoint.flush.rate
sqlserver.page.operation.rate

SQL Server, Access Methods object

sqlserver.page.split.rate

For future reference, the mapping of SQL Server objects to metrics can be found here.

@crobert-1 crobert-1 force-pushed the sqlserver_direct_connect branch from 792e8e7 to 06d65b1 Compare March 27, 2024 18:25
@@ -158,6 +167,14 @@ metrics:
unit: "{transactions}/s"
gauge:
value_type: double
sqlserver.database.io.read_latency:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to name this sqlserver.database.io.read_latency instead of sqlserver.database_io.read_latency to try to closer match semantic conventions for io.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to cumulative sum based on underlying metric description (refer to io_stall_read_ms.)

receiver/sqlserverreceiver/scraper.go Outdated Show resolved Hide resolved
receiver/sqlserverreceiver/scraper.go Outdated Show resolved Hide resolved
Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is quite large and I don't feel I can review it confidently. Can we break any parts out into smaller PRs?

djaglowski pushed a commit that referenced this pull request Apr 4, 2024
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
Add a query for the database IO metrics that SQL Server exposes. Read
more about available metrics
[here](https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-io-virtual-file-stats-transact-sql?view=sql-server-ver16).
This is a no-op update as this query will not be used until direct
connection to the SQL Server instance is fully implemented. This is
simply part of the effort.

**Note:** This code is currently not reached. This is on purpose.

**Link to tracking Issue:** <Issue number if applicable>
This was originally part of
#31915,
but I'm breaking this out to make the original PR a more manageable
size.


#30297

**Testing:** <Describe what testing was performed and which tests were
added.>
Existing tests and added tests are passing.

**Documentation:** <Describe the documentation added.>
Purposefully none: This is currently dead code until all of #31915 gets
merged, so it shouldn't be used.
@crobert-1 crobert-1 force-pushed the sqlserver_direct_connect branch from f813757 to fbf5f1a Compare April 8, 2024 21:30
@crobert-1 crobert-1 force-pushed the sqlserver_direct_connect branch from 18a76a3 to afdb124 Compare April 9, 2024 15:51
djaglowski pushed a commit that referenced this pull request Apr 16, 2024
)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
This adds config options for direct connection to a SQL server instance.
For the sake of PR size, this adds no functionality, it's simply config
options and validation.

**Link to tracking Issue:** <Issue number if applicable>
This was originally part of
#31915,
but I'm breaking this out to make the original PR a more manageable
size.


#30297

**Testing:** <Describe what testing was performed and which tests were
added.>
Existing tests and added tests are all passing.

**Documentation:** <Describe the documentation added.>
No documentation on purpose. The added config options currently have no
impact on functionality, so users shouldn't be aware of this, and
shouldn't try to use them yet.
djaglowski pushed a commit that referenced this pull request Apr 16, 2024
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
This is part of
#31915,
to try to make it more manageable in size. This variable will be used by
the SQL Server receiver.
@crobert-1 crobert-1 force-pushed the sqlserver_direct_connect branch from afdb124 to 7ab147e Compare April 16, 2024 19:16
@crobert-1 crobert-1 force-pushed the sqlserver_direct_connect branch from 7ab147e to 3635289 Compare April 16, 2024 21:27
djaglowski pushed a commit that referenced this pull request Apr 17, 2024
…32471)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
A major part of
#31915
is adding support for other operating systems than Windows. This change
moves Windows-specific logic into files named as such, and renames some
specific functionality to match purpose, instead of generic names.

`newSQLServerScraper` -> `newSQLServerPCScraper` (PC in this context
stands for performance counters)
`sqlServerScraper` -> `sqlServerPCScraper``

This only renames internal functions and files, there's no user-facing
impact of this.

**Link to tracking Issue:** <Issue number if applicable>

#31915

#30297

**Testing:** <Describe what testing was performed and which tests were
added.>
Existing tests are passing.
@crobert-1 crobert-1 force-pushed the sqlserver_direct_connect branch from 3635289 to 143500c Compare April 17, 2024 18:26
djaglowski pushed a commit that referenced this pull request Apr 19, 2024
…tems (#32542)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
This change enables the SQL Server receiver to directly connect to SQL
Server instances, and to run on operating systems other than Windows.

This is part of
#31915,
but doesn't add any functionality for the end user. No metrics are
currently being gathered from SQL server instances, to make the PR
(hopefully) simpler to review.

**Link to tracking Issue:** <Issue number if applicable>

#31915

#30297

**Testing:** <Describe what testing was performed and which tests were
added.>
Tests added are passing.
Add changelog

Fix test CI/CD failures

make gotidy, fix naming collision

Update receiver/sqlserverreceiver/README.md

Co-authored-by: Dmitrii Anoshin <[email protected]>

Convert unit to seconds, use errors.Join instead of multierr

Fix linting failure

Add extended_documentation for direct connection-only metrics

Fix dep version

Rename sqlserver.database_io.read_latency to sqlserver.database.io.read_latency

Update metric description and type

Update receiver/sqlserverreceiver/scraper.go

Update receiver/sqlserverreceiver/scraper.go

Update receiver/sqlserverreceiver/scraper.go

Fix gomod

Fix merge conflict with Windows-specific functionality

Add back scraper.go file
@crobert-1 crobert-1 force-pushed the sqlserver_direct_connect branch from 12c989f to d1209fb Compare April 19, 2024 17:17
Copy link
Contributor

@atoulme atoulme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my limited view, all looks good. Approved.

Copy link
Member

@dmitryax dmitryax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable direct connection to SQL Server instance
5 participants