Skip to content

Commit

Permalink
[receiver/filelog] Add settings to annotate file owner and group names (
Browse files Browse the repository at this point in the history
#30776)

**Description:** When you want to filter log on a shared host with
multiple user, you want to be able to add owner and group of the file
log. Currenly this lookup it's not possible so it's hard to filter
after.
If include_file_infos is true when reading a file on filelogreceiver, it
will add the file owner as the attribute `log.file.owner` and the file
group as the attribute `log.file.group`.
**Link to tracking Issue:** #30775

**Testing:** Add unity test 

**Documentation:** Add documentation on file_input and file log
receiver.
  • Loading branch information
tprelle authored Mar 27, 2024
1 parent 0962321 commit fdaca38
Show file tree
Hide file tree
Showing 11 changed files with 147 additions and 20 deletions.
27 changes: 27 additions & 0 deletions .chloggen/add_include_file_infos.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: filelogreceiver

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: When reading a file on filelogreceiver not on windows, if include_file_owner_name is true, it will add the file owner name as the attribute `log.file.owner.name` and if include_file_owner_group_name is true, it will add the file owner group name as the attribute `log.file.owner.group.name`.

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [30775]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: [user]
2 changes: 2 additions & 0 deletions pkg/stanza/docs/operators/file_input.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ The `file_input` operator reads logs from files. It will place the lines read in
| `include_file_path` | `false` | Whether to add the file path as the attribute `log.file.path`. |
| `include_file_name_resolved` | `false` | Whether to add the file name after symlinks resolution as the attribute `log.file.name_resolved`. |
| `include_file_path_resolved` | `false` | Whether to add the file path after symlinks resolution as the attribute `log.file.path_resolved`. |
| `include_file_owner_name` | `false` | Whether to add the file owner name as the attribute `log.file.owner.name`. Not supported for windows. |
| `include_file_owner_group_name` | `false` | Whether to add the file group name as the attribute `log.file.owner.group.name`. Not supported for windows. |
| `preserve_leading_whitespaces` | `false` | Whether to preserve leading whitespaces. |
| `preserve_trailing_whitespaces` | `false` | Whether to preserve trailing whitespaces. |
| `start_at` | `end` | At startup, where to start reading logs from the file. Options are `beginning` or `end`. This setting will be ignored if previously read file offsets are retrieved from a persistence mechanism. |
Expand Down
30 changes: 21 additions & 9 deletions pkg/stanza/fileconsumer/attrs/attrs.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,25 +5,31 @@ package attrs // import "github.com/open-telemetry/opentelemetry-collector-contr

import (
"fmt"
"os"
"path/filepath"
"runtime"
)

const (
LogFileName = "log.file.name"
LogFilePath = "log.file.path"
LogFileNameResolved = "log.file.name_resolved"
LogFilePathResolved = "log.file.path_resolved"
LogFileName = "log.file.name"
LogFilePath = "log.file.path"
LogFileNameResolved = "log.file.name_resolved"
LogFilePathResolved = "log.file.path_resolved"
LogFileOwnerName = "log.file.owner.name"
LogFileOwnerGroupName = "log.file.owner.group.name"
)

type Resolver struct {
IncludeFileName bool `mapstructure:"include_file_name,omitempty"`
IncludeFilePath bool `mapstructure:"include_file_path,omitempty"`
IncludeFileNameResolved bool `mapstructure:"include_file_name_resolved,omitempty"`
IncludeFilePathResolved bool `mapstructure:"include_file_path_resolved,omitempty"`
IncludeFileName bool `mapstructure:"include_file_name,omitempty"`
IncludeFilePath bool `mapstructure:"include_file_path,omitempty"`
IncludeFileNameResolved bool `mapstructure:"include_file_name_resolved,omitempty"`
IncludeFilePathResolved bool `mapstructure:"include_file_path_resolved,omitempty"`
IncludeFileOwnerName bool `mapstructure:"include_file_owner_name,omitempty"`
IncludeFileOwnerGroupName bool `mapstructure:"include_file_owner_group_name,omitempty"`
}

func (r *Resolver) Resolve(path string) (attributes map[string]any, err error) {
func (r *Resolver) Resolve(file *os.File) (attributes map[string]any, err error) {
var path = file.Name()
// size 2 is sufficient if not resolving symlinks. This optimizes for the most performant cases.
attributes = make(map[string]any, 2)
if r.IncludeFileName {
Expand All @@ -32,6 +38,12 @@ func (r *Resolver) Resolve(path string) (attributes map[string]any, err error) {
if r.IncludeFilePath {
attributes[LogFilePath] = path
}
if r.IncludeFileOwnerName || r.IncludeFileOwnerGroupName {
err = r.addOwnerInfo(file, attributes)
if err != nil {
return nil, err
}
}
if !r.IncludeFileNameResolved && !r.IncludeFilePathResolved {
return attributes, nil
}
Expand Down
33 changes: 26 additions & 7 deletions pkg/stanza/fileconsumer/attrs/attrs_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ package attrs
import (
"fmt"
"path/filepath"
"runtime"
"testing"

"github.com/stretchr/testify/assert"
Expand All @@ -16,25 +17,27 @@ import (
func TestResolver(t *testing.T) {
t.Parallel()

for i := 0; i < 16; i++ {
for i := 0; i < 64; i++ {

// Create a 4 bit string where each bit represents the value of a config option
bitString := fmt.Sprintf("%04b", i)
bitString := fmt.Sprintf("%06b", i)

// Create a resolver with a config that matches the bit pattern of i
r := Resolver{
IncludeFileName: bitString[0] == '1',
IncludeFilePath: bitString[1] == '1',
IncludeFileNameResolved: bitString[2] == '1',
IncludeFilePathResolved: bitString[3] == '1',
IncludeFileName: bitString[0] == '1',
IncludeFilePath: bitString[1] == '1',
IncludeFileNameResolved: bitString[2] == '1',
IncludeFilePathResolved: bitString[3] == '1',
IncludeFileOwnerName: bitString[4] == '1' && runtime.GOOS != "windows",
IncludeFileOwnerGroupName: bitString[5] == '1' && runtime.GOOS != "windows",
}

t.Run(bitString, func(t *testing.T) {
// Create a file
tempDir := t.TempDir()
temp := filetest.OpenTemp(t, tempDir)

attributes, err := r.Resolve(temp.Name())
attributes, err := r.Resolve(temp)
assert.NoError(t, err)

var expectLen int
Expand Down Expand Up @@ -67,6 +70,22 @@ func TestResolver(t *testing.T) {
} else {
assert.Empty(t, attributes[LogFilePathResolved])
}
if r.IncludeFileOwnerName {
expectLen++
assert.NotNil(t, attributes[LogFileOwnerName])
assert.IsType(t, "", attributes[LogFileOwnerName])
} else {
assert.Empty(t, attributes[LogFileOwnerName])
assert.Empty(t, attributes[LogFileOwnerName])
}
if r.IncludeFileOwnerGroupName {
expectLen++
assert.NotNil(t, attributes[LogFileOwnerGroupName])
assert.IsType(t, "", attributes[LogFileOwnerGroupName])
} else {
assert.Empty(t, attributes[LogFileOwnerGroupName])
assert.Empty(t, attributes[LogFileOwnerGroupName])
}
assert.Equal(t, expectLen, len(attributes))
})
}
Expand Down
37 changes: 37 additions & 0 deletions pkg/stanza/fileconsumer/attrs/owner_other.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

//go:build !windows

package attrs // import "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer/attrs"

import (
"fmt"
"os"
"os/user"
"syscall"
)

func (r *Resolver) addOwnerInfo(file *os.File, attributes map[string]any) error {
fileInfo, errStat := file.Stat()
if errStat != nil {
return fmt.Errorf("resolve file stat: %w", errStat)
}
fileStat := fileInfo.Sys().(*syscall.Stat_t)

if r.IncludeFileOwnerName {
fileOwner, errFileUser := user.LookupId(fmt.Sprint(fileStat.Uid))
if errFileUser != nil {
return fmt.Errorf("resolve file owner name: %w", errFileUser)
}
attributes[LogFileOwnerName] = fileOwner.Username
}
if r.IncludeFileOwnerGroupName {
fileGroup, errFileGroup := user.LookupGroupId(fmt.Sprint(fileStat.Gid))
if errFileGroup != nil {
return fmt.Errorf("resolve file group name: %w", errFileGroup)
}
attributes[LogFileOwnerGroupName] = fileGroup.Name
}
return nil
}
15 changes: 15 additions & 0 deletions pkg/stanza/fileconsumer/attrs/owner_windows.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

//go:build windows

package attrs // import "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer/attrs"

import (
"fmt"
"os"
)

func (r *Resolver) addOwnerInfo(file *os.File, attributes map[string]any) error {
return fmt.Errorf("owner info not implemented for windows")
}
9 changes: 7 additions & 2 deletions pkg/stanza/fileconsumer/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import (
"bufio"
"errors"
"fmt"
"runtime"
"time"

"go.opentelemetry.io/collector/featuregate"
Expand Down Expand Up @@ -220,11 +221,15 @@ func (c Config) validate() error {
if c.StartAt == "end" {
return fmt.Errorf("'header' cannot be specified with 'start_at: end'")
}
if _, err := header.NewConfig(c.Header.Pattern, c.Header.MetadataOperators, enc); err != nil {
return fmt.Errorf("invalid config for 'header': %w", err)
if _, errConfig := header.NewConfig(c.Header.Pattern, c.Header.MetadataOperators, enc); errConfig != nil {
return fmt.Errorf("invalid config for 'header': %w", errConfig)
}
}

if runtime.GOOS == "windows" && (c.Resolver.IncludeFileOwnerName || c.Resolver.IncludeFileOwnerGroupName) {
return fmt.Errorf("'include_file_owner_name' or 'include_file_owner_group_name' it's not supported for windows: %w", err)
}

return nil
}

Expand Down
2 changes: 2 additions & 0 deletions pkg/stanza/fileconsumer/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ func TestNewConfig(t *testing.T) {
assert.False(t, cfg.IncludeFilePath)
assert.False(t, cfg.IncludeFileNameResolved)
assert.False(t, cfg.IncludeFilePathResolved)
assert.False(t, cfg.IncludeFileOwnerName)
assert.False(t, cfg.IncludeFileOwnerGroupName)
}

func TestUnmarshal(t *testing.T) {
Expand Down
4 changes: 2 additions & 2 deletions pkg/stanza/fileconsumer/internal/reader/factory.go
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ func (f *Factory) NewFingerprint(file *os.File) (*fingerprint.Fingerprint, error
}

func (f *Factory) NewReader(file *os.File, fp *fingerprint.Fingerprint) (*Reader, error) {
attributes, err := f.Attributes.Resolve(file.Name())
attributes, err := f.Attributes.Resolve(file)
if err != nil {
return nil, err
}
Expand Down Expand Up @@ -108,7 +108,7 @@ func (f *Factory) NewReaderFromMetadata(file *os.File, m *Metadata) (r *Reader,
r.processFunc = r.headerReader.Process
}

attributes, err := f.Attributes.Resolve(file.Name())
attributes, err := f.Attributes.Resolve(file)
if err != nil {
return nil, err
}
Expand Down
6 changes: 6 additions & 0 deletions pkg/stanza/operator/input/file/file_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ func TestAddFileResolvedFields(t *testing.T) {
cfg.IncludeFilePath = true
cfg.IncludeFileNameResolved = true
cfg.IncludeFilePathResolved = true
cfg.IncludeFileOwnerName = runtime.GOOS != "windows"
cfg.IncludeFileOwnerGroupName = runtime.GOOS != "windows"
})

// Create temp dir with log file
Expand Down Expand Up @@ -63,6 +65,10 @@ func TestAddFileResolvedFields(t *testing.T) {
require.Equal(t, symLinkPath, e.Attributes["log.file.path"])
require.Equal(t, filepath.Base(resolved), e.Attributes["log.file.name_resolved"])
require.Equal(t, resolved, e.Attributes["log.file.path_resolved"])
if runtime.GOOS != "windows" {
require.NotNil(t, e.Attributes["log.file.owner.name"])
require.NotNil(t, e.Attributes["log.file.owner.group.name"])
}
}

// ReadExistingLogs tests that, when starting from beginning, we
Expand Down
2 changes: 2 additions & 0 deletions receiver/filelogreceiver/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ Tails and parses logs from files.
| `include_file_path` | `false` | Whether to add the file path as the attribute `log.file.path`. |
| `include_file_name_resolved` | `false` | Whether to add the file name after symlinks resolution as the attribute `log.file.name_resolved`. |
| `include_file_path_resolved` | `false` | Whether to add the file path after symlinks resolution as the attribute `log.file.path_resolved`. |
| `include_file_owner_name` | `false` | Whether to add the file owner name as the attribute `log.file.owner.name`. Not supported for windows. |
| `include_file_owner_group_name` | `false` | Whether to add the file group name as the attribute `log.file.owner.group.name`. Not supported for windows. |
| `poll_interval` | 200ms | The [duration](#time-parameters) between filesystem polls. |
| `fingerprint_size` | `1kb` | The number of bytes with which to identify a file. The first bytes in the file are used as the fingerprint. Decreasing this value at any point will cause existing fingerprints to forgotten, meaning that all files will be read from the beginning (one time) |
| `max_log_size` | `1MiB` | The maximum size of a log entry to read. A log entry will be truncated if it is larger than `max_log_size`. Protects against reading large amounts of data into memory. |
Expand Down

0 comments on commit fdaca38

Please sign in to comment.